Applications

Background removal and batch image generation across two Mac Studios

Beyond cover art: background removal, batch resources, and two GPUs of throwaway pixels

TL;DR Cover art was the gateway drug. The same local ComfyUI install that generates this blog’s headers also strips the cluttered background off a photo of hardware on my bench, upscales a small generation to retina resolution, and batch-produces a consistent set of illustrations from a prompt template. Two Mac Studios mean I can fire a batch at one box and keep working on the other. It’s all driven from scripts and agents, and it all costs $0 per image because it never leaves the house. ...

Prompt to ComfyUI to S3 to Hugo image generation pipeline

From prompt to published: how every image on this blog comes out of a local ComfyUI

TL;DR I don’t pay for stock photos and I don’t open Canva. Every raster image on this blog is generated on a Mac Studio sitting three feet from me, by asking Claude Code to call a generate_image MCP tool that wraps ComfyUI. The pipeline is: prompt → ComfyUI (MPS) → PNG on disk → upload_media.py → S3 → CloudFront → a Markdown reference in the post. It costs $0 per image, takes ~15 seconds, and the whole thing is repeatable because the prompt and settings live in the commit history. ...

An MCP server wrapping a local homelab API for AI agents

Writing MCP servers for your homelab: five tools, 200 lines, and your agents get hands

TL;DR Model Context Protocol (MCP) is a transport layer that lets Claude and other LLM agents call local tools with typed signatures and structured responses. Any HTTP API running on your homelab — ComfyUI, a wiki, a dashboard, a custom service — can become a set of agent-callable tools by wrapping it in a FastMCP server. A typical server takes 150–250 lines of Python, exposes 3–5 tools via @mcp.tool() decorators, and runs as a stdio process. The pattern scales from single-purpose (image generation) to multi-tool (queue status, model listing, system stats) without complexity explosion. This post shows the anatomy by dissecting the ComfyUI MCP server: how to build workflows, poll for completion, parse results, and return structured JSON that agents actually use. ...

Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE

Running a 1T-parameter MoE locally on two Mac Studios over Thunderbolt 5

TL;DR Two M3 Ultra Mac Studios — 256GB unified memory each — connected by a Thunderbolt 5 cable can run mixture-of-experts models in the trillion-parameter range that no single 256GB box can fit. The hot path stays on Box 1; Box 2 hosts heavier experts and gets called via a local nginx proxy on port 11436. Real-world power draw is nowhere near the spec sheet. Some models still don’t fit even with two boxes (Kimi K2.6 native INT4), and that’s a genuinely useful constraint to know. ...

AI Agents Work Better When They Actually Know How You Operate

TL;DR AI agents fail when they don’t know what you know. I built a Slack bot that conducts structured 5-layer interviews to extract tacit knowledge — operating rhythms, decision criteria, dependencies, friction points, leverage opportunities — and generates soul.md, user.md, and heartbeat.md config files for provisioning agents. The interview surfaces ~30% more actionable context than documentation alone. Full source code below. The Problem Nobody’s Talking About Nate B. Jones has a video that nails the core issue with AI agents: they fail because they lack tacit knowledge. Not the stuff in your docs — the stuff in your head. The 20-year veteran who just knows that the staging deploy takes longer on Thursdays because the batch job runs. The designer who can feel when a color palette is wrong without being able to articulate why. ...

Running GLM-5.1 (744B) Locally on a Mac Studio: Benchmark Results

TL;DR I loaded Z.ai’s GLM-5.1 — a 744B parameter MoE model with 40B active parameters — onto a Mac Studio M3 Ultra with 256GB unified memory using a 2-bit quantized GGUF via llama.cpp. It runs at 5.8 tok/s with a 120-second time to first token. The financial analysis quality is genuinely impressive, but it eats 222GB of the 256GB available, leaving room for literally nothing else. It’s a “clear the schedule” model, not an always-on one. ...

Securing Jellyfin when it's exposed to the internet

TL;DR Someone asked me on Reddit for a comprehensive guide to securing a public-facing Jellyfin instance, so here it is. The short answer I gave was: fail2ban, automate patching, implement OAuth, and download an IP block list. This post expands all four into actionable steps and adds a fifth option — IP whitelisting with a DDNS-aware Python cron job — plus the honest answer that a VPN eliminates most of this complexity entirely. ...

Stress Testing GPU Transcoding in Kubernetes with JF_hw_stress

TL;DR JF_hw_stress is a headless transcoding stress tester that answers one question: how many concurrent transcode streams can your GPU actually handle before quality degrades? It runs escalating FFmpeg transcodes against real media files using VAAPI hardware acceleration, measures FPS ratios, and outputs a JSON report. I run it as a Kubernetes Job on the same k3s cluster from Cluster Genesis, scheduled exclusively on the GPU node (Intel UHD 630). The job auto-deletes after 10 minutes so it does not accumulate stale pods. ...

PiKey: A Raspberry Pi That Pretends to Be Your Keyboard

TL;DR PiKey is a Raspberry Pi project that spoofs a Logitech K380 Bluetooth keyboard and mouse. It jiggles the mouse to prevent idle detection and auto-types LLM-generated text to simulate human activity. The device appears as a standard Bluetooth HID peripheral – no drivers or software needed on the target machine. Three full implementations exist: Python (primary), Rust (static binary), and C (minimal dependencies). The whole thing was inspired by a Reddit thread on r/overemployed where someone asked for exactly this device. ...

OpenClaw Multi-User: Privacy, Dual AI Backends, and Per-User Cost Tracking

TL;DR Multi-user AI chat with privacy guarantees, dual model providers (Anthropic direct API + AWS Bedrock via LiteLLM), and per-user cost tracking via Prometheus and Grafana. The admin cannot read other users’ conversations. Three family members authenticate via Google OAuth, each getting isolated chat sessions. Anthropic serves as the primary model provider with lower latency, and Bedrock via LiteLLM acts as a fallback. Per-user spend is tracked through LiteLLM’s Prometheus metrics without any surveillance of conversation content. This is a follow-up to the OpenClaw on k3s setup post. ...