Scheduled disaster recovery rebuild timeline on a homelab cluster Scheduled disaster recovery rebuild timeline on a homelab cluster

The Saturday DR drill — burning the cluster down on purpose

TL;DR Three weeks after accidentally wiping GitLab with a misdirected blkdiscard and rebuilding from S3, I scheduled a deliberate drill: wipe GitLab, Vault, Harbor’s proxy cache, Authentik’s database, and one Longhorn volume on a Saturday morning, then rebuild everything from Terraform + S3 with a stopwatch running. Total drill time: 4 hours 22 minutes, end to end. About 90 minutes of that was actual rebuild work; the rest was discovering pieces of state I’d accidentally left out of the IaC. ...

May 23, 2026 · 9 min · zolty
Migration arrows from managed cloud services to a self-hosted cluster Migration arrows from managed cloud services to a self-hosted cluster

From managed to owned — the case for self-hosting in 2026

TL;DR A year ago my stack was the usual mix — GitHub for code, ECR for images, GitHub Actions for CI, Docker Hub for upstreams, Route53 + S3 + CloudFront for the blog. Most of that’s still where it should be. About a third of it isn’t. This post is the retrospective on what came home, what stayed rented, and the rule of thumb I now use when deciding which side of the line a new service goes on. The short version: self-host the things you operate; rent the things you’d never have time to operate. ...

May 20, 2026 · 7 min · zolty
Vault HA cluster fronted by Authentik with KMS auto-unseal Vault HA cluster fronted by Authentik with KMS auto-unseal

HashiCorp Vault behind Authentik — secrets that survive an auditor

TL;DR I had Authentik handling human auth and kubeseal handling cluster secrets, which left a gap: anything that needed a real secret at runtime — API tokens, database passwords, Bedrock keys — was one kubectl get secret away from being readable in plaintext. I deployed HashiCorp Vault as a 3-node HA cluster on k3s, auto-unsealed via AWS KMS, with Authentik OIDC for human SSO and the Kubernetes auth method for workloads. Apps get their secrets injected by a sidecar; no app code touches a k8s Secret object anymore. The migration took a weekend and removed an entire category of “what if this got read” worry I’d been ignoring. ...

May 17, 2026 · 8 min · zolty
Power meter and heat-flow diagram for a homelab rack Power meter and heat-flow diagram for a homelab rack

Watts, BTUs, and the real cost of running a homelab 24/7

TL;DR A homelab feels free until you read the meter. After a year of running seven k3s nodes plus a pair of Mac Studios under whatever workload I felt like throwing at them, I sat down with a Kill-a-Watt and worked out what the cluster actually costs to keep on. Idle is genuinely cheap. Sustained LLM inference is not. The honest break-even against cloud inference is workload-shaped, and for my workloads, on-prem wins — but only because I run them often enough to amortize the wattage. The numbers below are mine; substitute your electricity rate to get yours. ...

May 14, 2026 · 7 min · zolty
Four-rung ladder showing supervised, monitored, trusted, full autonomy stages Four-rung ladder showing supervised, monitored, trusted, full autonomy stages

The agent autonomy trust ladder: supervised → monitored → trusted → full

TL;DR I run a growing fleet of autonomous agents — homelab ops, trading research, content generation. Most blow up the first few times they try anything new. I needed a way to decide what an agent is allowed to do without asking me, and what still requires a human checkpoint. The answer is a four-rung trust ladder — supervised, monitored, trusted, full autonomy. Agents earn rungs through track record, not promises. Demotions are possible and routine. The framework took the question “should this agent be allowed to do X” out of my head every single time and turned it into a policy I can apply consistently. ...

May 11, 2026 · 6 min · zolty
Harbor proxy cache fronting upstream registries Harbor proxy cache fronting upstream registries

Harbor as a proxy cache for every upstream registry — killing rate limits in a homelab

TL;DR Every node in my k3s cluster used to pull images directly from docker.io, ghcr.io, lscr.io, and quay.io. That meant Docker Hub rate limits, occasional 5xx storms from ghcr, and a hard outage when quay.io went sideways for a few hours. I put Harbor in front of all of them as a proxy cache, pointed containerd at Harbor, and the registry-related noise in my cluster effectively went to zero. Image pulls also got faster — 10GbE LAN beats every public CDN I’ve measured against. ...

May 1, 2026 · 4 min · zolty
GitLab CE on k3s with S3 backup arrows GitLab CE on k3s with S3 backup arrows

Migrating from GitHub to self-hosted GitLab CE — and rebuilding it from S3

TL;DR I moved every private homelab repo off GitHub onto a self-hosted GitLab CE 18.10 instance running on my k3s cluster. GitHub stays as a read-only mirror plus the break-glass k3s_bootstrap repo. Two weeks later I accidentally blkdiscard’d the GitLab volume and rebuilt the entire instance from an S3 backup. It worked, but the boring parts — runner re-registration, group tokens, container-registry pull secrets — were the real cost. Why bother GitHub was fine. GitHub Actions was fine. The thing that pushed me over was billing math plus blast radius: ...

April 29, 2026 · 5 min · zolty
Agentic Claude processes reporting back from long-running OpenClaw workers Agentic Claude processes reporting back from long-running OpenClaw workers

Giving Claude the ability to talk back: agentic long-running processes in OpenClaw

Heads up: this post mentions Claude. If you want to try it, I've got a referral link — it gives us both a bit of extra credit, no pressure: claude.ai via my referral. TL;DR Most AI tooling still treats an LLM like a search bar — you prompt, it answers, the loop ends. Useful, but not what I wanted. For my homelab’s ops + trading intelligence platform (OpenClaw), I needed agents that could run for hours, do real work against a real cluster, and then tap me on the shoulder when they found something I should see. Claude turned out to be the model I kept coming back to for the “thinking” layer — it’s both comfortable with long tool-use chains and happy to write structured output a human won’t need to decode. This is a tour of how I’ve actually wired that up: k3s CronJobs doing the heavy lifting, LiteLLM as the routing layer, Slack as the interrupt bus, and named cat-bot personas so I can tell at a glance who’s knocking. ...

April 21, 2026 · 11 min · zolty
Domain interviewer bot architecture Domain interviewer bot architecture

AI Agents Work Better When They Actually Know How You Operate

TL;DR AI agents fail when they don’t know what you know. I built a Slack bot that conducts structured 5-layer interviews to extract tacit knowledge — operating rhythms, decision criteria, dependencies, friction points, leverage opportunities — and generates soul.md, user.md, and heartbeat.md config files for provisioning agents. The interview surfaces ~30% more actionable context than documentation alone. Full source code below. The Problem Nobody’s Talking About Nate B. Jones has a video that nails the core issue with AI agents: they fail because they lack tacit knowledge. Not the stuff in your docs — the stuff in your head. The 20-year veteran who just knows that the staging deploy takes longer on Thursdays because the batch job runs. The designer who can feel when a color palette is wrong without being able to articulate why. ...

April 16, 2026 · 11 min · zolty
Auto-documenting homelab architecture diagrams Auto-documenting homelab architecture diagrams

Auto-documenting a homelab: the quest for free architecture diagrams

TL;DR I spent a full day trying to automatically generate professional architecture diagrams for a 7-node k3s homelab. Figma’s MCP integration was perfect but requires a paid subscription. I tried Excalidraw (JSON generation + Kroki rendering), Mermaid, and finally landed on raw SVG generation in Python. The result is 27 diagrams with tech icons, drop shadows, and curved arrows — but the process is more manual than I’d like. I’m curious if anyone else has found a truly automated, free solution. ...

April 14, 2026 · 7 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.