Power meter and heat-flow diagram for a homelab rack Power meter and heat-flow diagram for a homelab rack

Watts, BTUs, and the real cost of running a homelab 24/7

TL;DR A homelab feels free until you read the meter. After a year of running seven k3s nodes plus a pair of Mac Studios under whatever workload I felt like throwing at them, I sat down with a Kill-a-Watt and worked out what the cluster actually costs to keep on. Idle is genuinely cheap. Sustained LLM inference is not. The honest break-even against cloud inference is workload-shaped, and for my workloads, on-prem wins — but only because I run them often enough to amortize the wattage. The numbers below are mine; substitute your electricity rate to get yours. ...

May 14, 2026 · 7 min · zolty
ComfyUI on Mac Studio with k3s ingress ComfyUI on Mac Studio with k3s ingress

ComfyUI on Mac Studio: MPS-Accelerated Image Generation Behind k3s Ingress

TL;DR I deployed ComfyUI natively on my Mac Studio M3 Ultra using Apple’s MPS GPU backend, proxied it through k3s Traefik ingress with Authentik SSO, wired it into Open WebUI as the image generation backend (replacing $0.04/image Bedrock calls), and built an MCP server so AI agents can generate images programmatically. The whole pipeline is Ansible-managed and generates images for free on local hardware. Why native instead of containerized ComfyUI needs GPU access. On Linux, that’s straightforward — pass through the GPU via device plugins. On macOS, there’s no container runtime that exposes MPS (Metal Performance Shaders) to containers. Docker Desktop on Mac runs a Linux VM — no Metal, no MPS. ...

April 11, 2026 · 6 min · zolty
Hardening OpenClaw container security Hardening OpenClaw container security

Hardening a Self-Hosted AI Agent: Multi-Stage Builds, NetworkPolicies, and Automated CVE Triage

TL;DR OpenClaw, my self-hosted AI trading agent, was running in a fat container with 46 Critical CVEs, no network restrictions, and no automated vulnerability scanning. I fixed all three: multi-stage Dockerfile dropped the CVE count to single digits, default-deny NetworkPolicies locked down traffic, and a daily CronJob triages Trivy scan results via local LLM and posts a digest to Slack. Total cost of the automated triage: $0/day. The problem with AI agent containers AI agent containers are uniquely bad from a security perspective. They need: ...

April 9, 2026 · 7 min · zolty
Dream Workers autonomous cluster agent Dream Workers autonomous cluster agent

Dream Workers: Letting an AI Agent Improve Your Cluster While You Sleep

TL;DR I built an “Ops Dream Worker” — a Kubernetes CronJob that runs at 3 AM, inspects the cluster, identifies improvements, and files GitHub issues with specific fixes. It runs entirely on local models (Mac Studio M3 Ultra), costs $0 per run, and went through 240 A/B test iterations to optimize the prompts. The anti-hallucination patterns were harder to get right than the analysis itself. The idea I have a k3s cluster with ~40 deployed services. I maintain it solo. There’s always something that could be better — a deployment missing resource limits, a CronJob that’s been failing silently, an ingress without SSO protection, a container image with known CVEs. These improvements pile up because I’m usually focused on building features, not auditing infrastructure. ...

April 8, 2026 · 8 min · zolty
Voice AI services on k3s Voice AI services on k3s

Voice AI on k3s: Whisper, Piper, and openWakeWord in Kubernetes

TL;DR I deployed Whisper (speech-to-text), Piper (text-to-speech), and openWakeWord (wake word detection) as Kubernetes workloads on my k3s cluster. Home Assistant connects to them over the Wyoming protocol for fully local voice pipelines. Total resource cost: ~1 CPU core and 1.75GB RAM. Total cloud cost: $0. Why run voice services in Kubernetes Home Assistant’s voice pipeline needs three things: something to listen for a wake word, something to transcribe speech, and something to speak back. The usual approach is running these on the same box as HA, or on a dedicated Pi. Both work fine until you want the services to survive node failures, be independently upgradeable, or share resources with other workloads. ...

April 7, 2026 · 6 min · zolty
Week of March 23 retrospective Week of March 23 retrospective

Week of March 23: Security Patches, AI Tooling, and Defending the Homelab on Reddit

TL;DR Busy week. Three CVE patches shipped on the same day. OpenClaw stabilized with OpenRouter support and a cost exporter. The Wiki.js fork with Mermaid 11 went live after clearing a Trivy scan. PiKey — a Raspberry Pi that pretends to be a Bluetooth keyboard — shipped as a side project. A self-hosted GitHub Actions cache server cut CI restore times from minutes to seconds. And a Reddit comment defending “I use Claude to manage my infrastructure” turned into five new blog posts and a documentation sprint. ...

March 29, 2026 · 6 min · zolty
K3s stability improvements K3s stability improvements

Two Months of K3s Stability Improvements

TL;DR Over the past two months, I have made a series of stability improvements to my k3s homelab cluster. The biggest wins: migrating from AWS ECR to self-hosted Harbor (eliminating 12-hour token expiry), fixing recurring Grafana crashes caused by SQLite corruption on Longhorn, recovering pve4 after a failed LXC experiment, hardening NetworkPolicies to close gaps in pod-to-host traffic rules, and patching multiple CVEs across the media stack. The cluster now runs 7/7 nodes on k3s v1.34.4, all services monitored, all images pulled from Harbor with static credentials that never expire. ...

March 27, 2026 · 8 min · zolty
Homelab infrastructure overview Homelab infrastructure overview

Homelab State of the Union: 42 Namespaces and Counting

TL;DR The homelab runs 42 Kubernetes namespaces across 7 nodes (3 control plane, 4 workers) on 4 Lenovo ThinkCentre M920q mini PCs running Proxmox VE. This post is the result of a full infrastructure audit — reconciling what’s actually running against what’s documented, catching version drift, and noting what’s been added, removed, or broken since the last check. Compute Four Lenovo ThinkCentre M920q nodes form the physical layer: Host CPU RAM NVMe Role pve1 i5-8500T 32GB 512GB 1 server VM + 1 agent VM pve2 i5-8500T 32GB 512GB 1 server VM + 1 agent VM pve3 i5-8500T 32GB 512GB 1 server VM + 1 agent VM pve4 i7-8700T 32GB 512GB 1 agent VM (GPU passthrough) The k3s cluster runs v1.34.4+k3s1 with embedded etcd for HA. All 7 nodes report Ready. Server VMs get 2 cores and 6GB each — just enough for etcd and the API server. Agent VMs are beefier: 6 cores and 22GB on pve1-3, 12 cores and 28GB on pve4. ...

March 26, 2026 · 6 min · zolty
CI/CD pipeline for blog deployment on k3s CI/CD pipeline for blog deployment on k3s

This Blog Deploys Itself: Self-Hosted CI/CD on k3s with GitHub ARC

TL;DR The blog is deployed by GitHub Actions runners running inside the same k3s cluster it’s talking about. A push to main with content under hugo/ triggers a build, a two-pass S3 sync, and a CloudFront invalidation. A daily 06:00 UTC cron handles future-dated posts so I can commit a backlog and let them drip out on schedule. After every successful deploy, a Playwright job kicks off and scans the live site for broken links, visual regressions, and security header compliance. The whole thing runs on eight self-hosted amd64 runners managed by GitHub’s Actions Runner Controller (ARC) in the cluster. Not a single managed CI minute gets billed. ...

March 26, 2026 · 7 min · zolty
Linkerd service mesh evaluation for homelab Kubernetes Linkerd service mesh evaluation for homelab Kubernetes

Linkerd Service Mesh: Why I'm Not Deploying It Yet (But Have a Plan Ready)

TL;DR I spent time evaluating Linkerd — the CNCF graduated service mesh — for my homelab k3s cluster. The conclusion: it’s an impressive piece of engineering with genuinely useful features like automatic mTLS, post-quantum cryptography, and per-service observability. But for a cluster with ~20 workloads and a single operator, the operational overhead outweighs the benefits today. I’ve written a complete deployment plan so I can adopt it quickly when the cluster grows to the point where it makes sense. ...

March 24, 2026 · 8 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.