Homelab

Every homelab service behind one login: Traefik forward-auth with Authentik

TL;DR Every service I run — ComfyUI, Grafana, Vault, even the ancient app on a Mac across the network — lives behind a Traefik forward-auth middleware that hands off to Authentik. No per-service login page. One Authentik login shared across everything. The magic is a two-route IngressRoute pattern: a protected route with the middleware + an unprotected callback route for the OAuth flow itself. Adding a new service to the cluster takes five lines of YAML. Wiring a non-Kubernetes backend — like the Mac that runs ComfyUI and Ollama — takes a service-with-manual-endpoints proxy. ...

Tiered model storage across local SSD and MinIO object storage

Tiered model storage with MinIO and rclone: keep the SSD hot, archive the rest

TL;DR Stable Diffusion 3.5 Large is 15 GB. RealVisXL is 6.5 GB. Throw in a few LoRAs and a VAE, and your SSD hits the wall fast. I run a MinIO bucket as the long-tail model store, sync it to a local overflow directory on a 30-minute schedule via rclone, and register both the hot (SSD) and cold (synced overflow) paths in ComfyUI’s extra_model_paths.yaml. Models appear transparently; the loader searches both tiers. A fresh model lands in MinIO, appears locally within 30 minutes, and ComfyUI finds it without any manual shuffling. ...

Mac Studio M3 Ultra as a GPU appliance proxied into a k3s cluster

The Mac Studio as a GPU appliance: serving Ollama and ComfyUI to a k3s cluster

TL;DR A Mac Studio M3 Ultra costs the same as a single 4090 but comes with 256 GB of unified memory and 60-core GPU, all running at 100–200 W under inference. I stopped trying to pass MPS into containers and instead run Ollama and ComfyUI natively on macOS, then proxy them back into k3s as simple Kubernetes Services with manual Endpoints. Two Mac Studios connected via Thunderbolt 5 split the load: one handles hot-path LLM inference and embeddings, the other runs the heavy forge for diffusion and long-horizon reasoning. Both are cheaper to run than a single-socket A100 and require no special driver stacks. ...

An Ultima Online town NPC with a speech bubble driven by a local language model

When the peasant talks back: LLM NPCs in Ultima Online

TL;DR I run an Ultima Online shard on my homelab where the NPCs are driven by a local LLM instead of canned dialog trees. Each NPC rolls a persisted identity, remembers conversations with individual players across reboots, runs its own errands and cross-map journeys, and — the part I’m writing about today — strikes up ambient chatter with nearby NPCs on its own. The newest work extends all of that from townsfolk to language-speaking monsters: ogres, lizardmen, ratmen, gargoyles, daemons, and especially liches, who address each other like god-kings deigning to notice an insect. Inference is a local gemma-class model behind an in-cluster gateway, so it’s free and private, with the one tradeoff being cold-load latency. It’s single-shard hobby-scale and it absolutely shows the seams. I love it. ...

A Surface tablet wall-mounted as a Home Assistant dashboard

A $150 Surface Pro 7 is the best Home Assistant wall panel you can buy

TL;DR Purpose-built smart-home wall panels are expensive, locked down, and usually underpowered. A used Microsoft Surface Pro 7 — Core i5 or i7, 16 GB RAM, a sharp 12.3" touchscreen — runs about $150 on the surplus market and makes a fantastic wall-mounted dashboard for Home Assistant, Grafana, or whatever you self-host. It’s a full x86 PC behind a great touchscreen, so it runs a real browser with your real dashboards, not a stripped-down panel app. Here’s the build. ...

A seam between homelab and cloud services, with arrows for the few things still in cloud

The seam — what I deliberately left in the cloud and why

TL;DR This is the counterpart to the manifesto and the DR drill. After moving a chunk of the stack home, a list of things deliberately stayed rented: Route53, ACM, S3, AWS KMS, the Anthropic API for Claude, Bedrock for Amazon-only models, a transactional email sender, and one repo on GitHub. Each of them earns its place by being either the long pole on availability or the dependency that has to outlive the cluster. Self-hosting maximalism is a trap; the seam is the feature. ...

Scheduled disaster recovery rebuild timeline on a homelab cluster

The Saturday DR drill — burning the cluster down on purpose

TL;DR Three weeks after accidentally wiping GitLab with a misdirected blkdiscard and rebuilding from S3, I scheduled a deliberate drill: wipe GitLab, Vault, Harbor’s proxy cache, Authentik’s database, and one Longhorn volume on a Saturday morning, then rebuild everything from Terraform + S3 with a stopwatch running. Total drill time: 4 hours 22 minutes, end to end. About 90 minutes of that was actual rebuild work; the rest was discovering pieces of state I’d accidentally left out of the IaC. ...

Migration arrows from managed cloud services to a self-hosted cluster

From managed to owned — the case for self-hosting in 2026

TL;DR A year ago my stack was the usual mix — GitHub for code, ECR for images, GitHub Actions for CI, Docker Hub for upstreams, Route53 + S3 + CloudFront for the blog. Most of that’s still where it should be. About a third of it isn’t. This post is the retrospective on what came home, what stayed rented, and the rule of thumb I now use when deciding which side of the line a new service goes on. The short version: self-host the things you operate; rent the things you’d never have time to operate. ...

Vault HA cluster fronted by Authentik with KMS auto-unseal

HashiCorp Vault behind Authentik — secrets that survive an auditor

TL;DR I had Authentik handling human auth and kubeseal handling cluster secrets, which left a gap: anything that needed a real secret at runtime — API tokens, database passwords, Bedrock keys — was one kubectl get secret away from being readable in plaintext. I deployed HashiCorp Vault as a 3-node HA cluster on k3s, auto-unsealed via AWS KMS, with Authentik OIDC for human SSO and the Kubernetes auth method for workloads. Apps get their secrets injected by a sidecar; no app code touches a k8s Secret object anymore. The migration took a weekend and removed an entire category of “what if this got read” worry I’d been ignoring. ...

Power meter and heat-flow diagram for a homelab rack

Watts, BTUs, and the real cost of running a homelab 24/7

TL;DR A homelab feels free until you read the meter. After a year of running seven k3s nodes plus a pair of Mac Studios under whatever workload I felt like throwing at them, I sat down with a Kill-a-Watt and worked out what the cluster actually costs to keep on. Idle is genuinely cheap. Sustained LLM inference is not. The honest break-even against cloud inference is workload-shaped, and for my workloads, on-prem wins — but only because I run them often enough to amortize the wattage. The numbers below are mine; substitute your electricity rate to get yours. ...