Kubernetes

A stack of Dell OptiPlex small-form-factor desktops wired as a k3s cluster

Build a 3-node K3s cluster from $150 surplus Dell OptiPlex desktops

TL;DR My production homelab runs on Lenovo M920q tinies, and I still think those are the sweet spot. But if I were starting over today with a tight budget, I’d buy a stack of government-surplus Dell OptiPlex 7060 and 7070 desktops instead. They go for around $150 each refurbished — 6-core 8th/9th-gen Intel, an SSD, and Windows 11 already on them — and they make excellent Kubernetes nodes with exactly two cheap upgrades: a bit more RAM and a second network card. ...

Traefik forward-auth middleware fronting homelab services with Authentik SSO

Every homelab service behind one login: Traefik forward-auth with Authentik

TL;DR Every service I run — ComfyUI, Grafana, Vault, even the ancient app on a Mac across the network — lives behind a Traefik forward-auth middleware that hands off to Authentik. No per-service login page. One Authentik login shared across everything. The magic is a two-route IngressRoute pattern: a protected route with the middleware + an unprotected callback route for the OAuth flow itself. Adding a new service to the cluster takes five lines of YAML. Wiring a non-Kubernetes backend — like the Mac that runs ComfyUI and Ollama — takes a service-with-manual-endpoints proxy. ...

Mac Studio M3 Ultra as a GPU appliance proxied into a k3s cluster

The Mac Studio as a GPU appliance: serving Ollama and ComfyUI to a k3s cluster

TL;DR A Mac Studio M3 Ultra costs the same as a single 4090 but comes with 256 GB of unified memory and 60-core GPU, all running at 100–200 W under inference. I stopped trying to pass MPS into containers and instead run Ollama and ComfyUI natively on macOS, then proxy them back into k3s as simple Kubernetes Services with manual Endpoints. Two Mac Studios connected via Thunderbolt 5 split the load: one handles hot-path LLM inference and embeddings, the other runs the heavy forge for diffusion and long-horizon reasoning. Both are cheaper to run than a single-socket A100 and require no special driver stacks. ...

Scheduled disaster recovery rebuild timeline on a homelab cluster

The Saturday DR drill — burning the cluster down on purpose

TL;DR Three weeks after accidentally wiping GitLab with a misdirected blkdiscard and rebuilding from S3, I scheduled a deliberate drill: wipe GitLab, Vault, Harbor’s proxy cache, Authentik’s database, and one Longhorn volume on a Saturday morning, then rebuild everything from Terraform + S3 with a stopwatch running. Total drill time: 4 hours 22 minutes, end to end. About 90 minutes of that was actual rebuild work; the rest was discovering pieces of state I’d accidentally left out of the IaC. ...

Migration arrows from managed cloud services to a self-hosted cluster

From managed to owned — the case for self-hosting in 2026

TL;DR A year ago my stack was the usual mix — GitHub for code, ECR for images, GitHub Actions for CI, Docker Hub for upstreams, Route53 + S3 + CloudFront for the blog. Most of that’s still where it should be. About a third of it isn’t. This post is the retrospective on what came home, what stayed rented, and the rule of thumb I now use when deciding which side of the line a new service goes on. The short version: self-host the things you operate; rent the things you’d never have time to operate. ...

Vault HA cluster fronted by Authentik with KMS auto-unseal

HashiCorp Vault behind Authentik — secrets that survive an auditor

TL;DR I had Authentik handling human auth and kubeseal handling cluster secrets, which left a gap: anything that needed a real secret at runtime — API tokens, database passwords, Bedrock keys — was one kubectl get secret away from being readable in plaintext. I deployed HashiCorp Vault as a 3-node HA cluster on k3s, auto-unsealed via AWS KMS, with Authentik OIDC for human SSO and the Kubernetes auth method for workloads. Apps get their secrets injected by a sidecar; no app code touches a k8s Secret object anymore. The migration took a weekend and removed an entire category of “what if this got read” worry I’d been ignoring. ...

Harbor proxy cache fronting upstream registries

Harbor as a proxy cache for every upstream registry — killing rate limits in a homelab

TL;DR Every node in my k3s cluster used to pull images directly from docker.io, ghcr.io, lscr.io, and quay.io. That meant Docker Hub rate limits, occasional 5xx storms from ghcr, and a hard outage when quay.io went sideways for a few hours. I put Harbor in front of all of them as a proxy cache, pointed containerd at Harbor, and the registry-related noise in my cluster effectively went to zero. Image pulls also got faster — 10GbE LAN beats every public CDN I’ve measured against. ...

Migrating from GitHub to self-hosted GitLab CE — and rebuilding it from S3

TL;DR I moved every private homelab repo off GitHub onto a self-hosted GitLab CE 18.10 instance running on my k3s cluster. GitHub stays as a read-only mirror plus the break-glass k3s_bootstrap repo. Two weeks later I accidentally blkdiscard’d the GitLab volume and rebuilt the entire instance from an S3 backup. It worked, but the boring parts — runner re-registration, group tokens, container-registry pull secrets — were the real cost. Why bother GitHub was fine. GitHub Actions was fine. The thing that pushed me over was billing math plus blast radius: ...

A closed business laptop running headless as a homelab server node

The cheapest homelab node has a built-in UPS: a used business laptop

TL;DR Everyone reaches for a mini PC or a Pi for a homelab node. The thing nobody tells you: a used business laptop is a server with a built-in UPS, screen, and keyboard bolted on for free. A Dell Latitude 7400 — 8th-gen Core i5, 16 GB RAM, NVMe SSD — runs about $150 used, draws ~10 W with the lid shut, and when the power flickers it doesn’t even notice, because it’s running off its own battery. I run a couple as edge nodes. Here’s the case for it and the five-minute headless setup. ...

Agentic Claude processes reporting back from long-running OpenClaw workers

Giving Claude the ability to talk back: agentic long-running processes in OpenClaw

Heads up: this post mentions Claude. If you want to try it, I've got a referral link — it gives us both a bit of extra credit, no pressure: claude.ai via my referral. TL;DR Most AI tooling still treats an LLM like a search bar — you prompt, it answers, the loop ends. Useful, but not what I wanted. For my homelab’s ops + trading intelligence platform (OpenClaw), I needed agents that could run for hours, do real work against a real cluster, and then tap me on the shoulder when they found something I should see. Claude turned out to be the model I kept coming back to for the “thinking” layer — it’s both comfortable with long tool-use chains and happy to write structured output a human won’t need to decode. This is a tour of how I’ve actually wired that up: k3s CronJobs doing the heavy lifting, LiteLLM as the routing layer, Slack as the interrupt bus, and named cat-bot personas so I can tell at a glance who’s knocking. ...