Homelab

Background removal and batch image generation across two Mac Studios

Beyond cover art: background removal, batch resources, and two GPUs of throwaway pixels

TL;DR Cover art was the gateway drug. The same local ComfyUI install that generates this blog’s headers also strips the cluttered background off a photo of hardware on my bench, upscales a small generation to retina resolution, and batch-produces a consistent set of illustrations from a prompt template. Two Mac Studios mean I can fire a batch at one box and keep working on the other. It’s all driven from scripts and agents, and it all costs $0 per image because it never leaves the house. ...

Prompt to ComfyUI to S3 to Hugo image generation pipeline

From prompt to published: how every image on this blog comes out of a local ComfyUI

TL;DR I don’t pay for stock photos and I don’t open Canva. Every raster image on this blog is generated on a Mac Studio sitting three feet from me, by asking Claude Code to call a generate_image MCP tool that wraps ComfyUI. The pipeline is: prompt → ComfyUI (MPS) → PNG on disk → upload_media.py → S3 → CloudFront → a Markdown reference in the post. It costs $0 per image, takes ~15 seconds, and the whole thing is repeatable because the prompt and settings live in the commit history. ...

PiKVM and Dell CCTK configuring a bench of headless small-form-factor PCs

Headless bench-PC fleet: imaging and BIOS-as-code with PiKVM and Dell CCTK

TL;DR I keep four small-form-factor PCs on a bench for testing and repurposing — bought used, need fresh OS images, fresh BIOS settings, and no monitor or keyboard. A PiKVM V4 Plus with a multiport switch gives me eyes and hands on all four boxes over the network. Dell’s cctk command-line tool (Command | Configure) lets me bake BIOS settings — boot order, AHCI mode, Wake-on-LAN, power-on-after-failure — into scripted runs instead of clicking through F2 menus. No monitor, no keyboard, no physical access for weeks at a time. Everything repeatable, everything as code. ...

Parallel agents sweeping repos for improvements under a token budget

Token-budgeted self-improvement: pointing parallel agents at my own repos

TL;DR I have $X in monthly Claude tokens I don’t always use. Instead of letting the unused credit evaporate, I built a parallel agent sweep that fans out autonomous scouts to scan for dependency upgrades, CVEs, CI waste, and quick wins across my repos. Each discovery agent returns a scored candidate list. The orchestrator triages and ranks them, then spins up isolated worktree agents to implement the safe ones — all under a hard token cap and with human gates between phases. The output is a pile of merge requests, not silent commits. Noise is real and review burden is the limiting factor, but when it lands right, an hour of agent work + human review beats a weekend of manual maintenance. ...

A GitLab CI pipeline using an LLM to review and fix merge requests

LLM-powered GitLab CI: auto-reviewing and auto-fixing merge requests

TL;DR I’ve wired LLMs into my GitLab CI pipeline to auto-review merge requests, post findings as comments, and (on command) generate patches and commit fixes. The key insight: deterministic gates run first. Before the LLM ever sees a diff, regex-enforced checks block deleted tests, committed secrets, and destructive commands. Regex is certain; LLM judgment is probabilistic. Gate first, judge second. The bot reviews silently unless it finds something, posts to the MR with confidence levels, and can be leveled up from read-only observer to trusted committer as it proves itself — hence the “autonomy ladder” (Rungs 0–4) that gates who decides what. Infrastructure repos cap at Rung 2 (never auto-merge). ...

A stack of Dell OptiPlex small-form-factor desktops wired as a k3s cluster

Build a 3-node K3s cluster from $150 surplus Dell OptiPlex desktops

TL;DR My production homelab runs on Lenovo M920q tinies, and I still think those are the sweet spot. But if I were starting over today with a tight budget, I’d buy a stack of government-surplus Dell OptiPlex 7060 and 7070 desktops instead. They go for around $150 each refurbished — 6-core 8th/9th-gen Intel, an SSD, and Windows 11 already on them — and they make excellent Kubernetes nodes with exactly two cheap upgrades: a bit more RAM and a second network card. ...

Langfuse tracing and cost dashboards for autonomous LLM agents

Tracing and budgeting LLM agents with Langfuse

TL;DR I run unattended LLM agents on my homelab — they write code, open MRs, generate content, rotate secrets. The problem: they fail silently and bill silently. Langfuse (a tracing platform) logs every LLM call with input/output tokens, latency, and cost. On top of those traces, I built three background monitors that run weekly: a goal-drift detector that compares an agent’s stated objective to what its commits actually did (via embedding similarity), a cost-spike alert that fires at 80% and 100% of a daily budget cap, and an action audit that exports traces and flags sessions where the tool-call sequence diverged from the plan. Together, these let me sleep while autonomous agents handle repetitive work. ...

A LiteLLM gateway routing many model providers behind one OpenAI-compatible endpoint

A LiteLLM gateway for the homelab: one endpoint, many models, hard cost caps

TL;DR I put a LiteLLM proxy gateway in front of every LLM I use — local Ollama models for bulk/cheap classification work, OpenRouter for frontier models when I need them, plus cloud vendors if needed. Every app and agent targets one OpenAI-compatible endpoint. Per-key budgets and daily spend alerts make runaway costs impossible. I define model-to-backend mappings in YAML, let LiteLLM handle the routing, and route based on intent: ask for solar-expert when I need a domain-specific Q&A bot backed by a small local model, ask for claude-opus-4-8 when I need real reasoning. The gateway cost? ~50ms latency overhead and one Kubernetes Deployment. The gain? No more vendor SDK sprawl, no more guessing which model is wired into a cron job, and spend visibility that I actually trust. ...

A crowded Ultima Online street where every NPC has something to say

The peasant has friends now: rumors, routines, and a 3,200-strong crowd

TL;DR Last time I wrote about giving my Ultima Online shard’s NPCs a voice, a memory, and a small autonomous life. That post ended with “the peasant talks back now.” In the eight days since, the project grew six new systems: NPCs keep daily routines anchored to real places, every town runs a rumor board that traveling NPCs physically carry between cities, townsfolk gossip about players (your katana, your karma, your reputation), the GM avatar got actual powers governed by a genie rule, villagers hand out delivery quests, and a population director keeps every city stocked with 200 ambient “denizens” who hail you in the street. That’s ~3,200 new NPCs and maybe a dozen new LLM call sites, still running entirely on a local gemma-class model — the trick is that the model never gained a single new permission. Every new capability is deterministic code; the LLM still only ever produces words and picks verbs off allowlists. Also: I found out my RAG pipeline had been silently dead for days, and the lesson there is worth the price of admission. ...

An MCP server wrapping a local homelab API for AI agents

Writing MCP servers for your homelab: five tools, 200 lines, and your agents get hands

TL;DR Model Context Protocol (MCP) is a transport layer that lets Claude and other LLM agents call local tools with typed signatures and structured responses. Any HTTP API running on your homelab — ComfyUI, a wiki, a dashboard, a custom service — can become a set of agent-callable tools by wrapping it in a FastMCP server. A typical server takes 150–250 lines of Python, exposes 3–5 tools via @mcp.tool() decorators, and runs as a stdio process. The pattern scales from single-purpose (image generation) to multi-tool (queue status, model listing, system stats) without complexity explosion. This post shows the anatomy by dissecting the ComfyUI MCP server: how to build workflows, poll for completion, parse results, and return structured JSON that agents actually use. ...