K3s | zolty.systems

A stack of Dell OptiPlex small-form-factor desktops wired as a k3s cluster

Build a 3-node K3s cluster from $150 surplus Dell OptiPlex desktops

TL;DR My production homelab runs on Lenovo M920q tinies, and I still think those are the sweet spot. But if I were starting over today with a tight budget, I’d buy a stack of government-surplus Dell OptiPlex 7060 and 7070 desktops instead. They go for around $150 each refurbished — 6-core 8th/9th-gen Intel, an SSD, and Windows 11 already on them — and they make excellent Kubernetes nodes with exactly two cheap upgrades: a bit more RAM and a second network card. ...

A crowded Ultima Online street where every NPC has something to say

The peasant has friends now: rumors, routines, and a 3,200-strong crowd

TL;DR Last time I wrote about giving my Ultima Online shard’s NPCs a voice, a memory, and a small autonomous life. That post ended with “the peasant talks back now.” In the eight days since, the project grew six new systems: NPCs keep daily routines anchored to real places, every town runs a rumor board that traveling NPCs physically carry between cities, townsfolk gossip about players (your katana, your karma, your reputation), the GM avatar got actual powers governed by a genie rule, villagers hand out delivery quests, and a population director keeps every city stocked with 200 ambient “denizens” who hail you in the street. That’s ~3,200 new NPCs and maybe a dozen new LLM call sites, still running entirely on a local gemma-class model — the trick is that the model never gained a single new permission. Every new capability is deterministic code; the LLM still only ever produces words and picks verbs off allowlists. Also: I found out my RAG pipeline had been silently dead for days, and the lesson there is worth the price of admission. ...

Traefik forward-auth middleware fronting homelab services with Authentik SSO

Every homelab service behind one login: Traefik forward-auth with Authentik

TL;DR Every service I run — ComfyUI, Grafana, Vault, even the ancient app on a Mac across the network — lives behind a Traefik forward-auth middleware that hands off to Authentik. No per-service login page. One Authentik login shared across everything. The magic is a two-route IngressRoute pattern: a protected route with the middleware + an unprotected callback route for the OAuth flow itself. Adding a new service to the cluster takes five lines of YAML. Wiring a non-Kubernetes backend — like the Mac that runs ComfyUI and Ollama — takes a service-with-manual-endpoints proxy. ...

Mac Studio M3 Ultra as a GPU appliance proxied into a k3s cluster

The Mac Studio as a GPU appliance: serving Ollama and ComfyUI to a k3s cluster

TL;DR A Mac Studio M3 Ultra costs the same as a single 4090 but comes with 256 GB of unified memory and 60-core GPU, all running at 100–200 W under inference. I stopped trying to pass MPS into containers and instead run Ollama and ComfyUI natively on macOS, then proxy them back into k3s as simple Kubernetes Services with manual Endpoints. Two Mac Studios connected via Thunderbolt 5 split the load: one handles hot-path LLM inference and embeddings, the other runs the heavy forge for diffusion and long-horizon reasoning. Both are cheaper to run than a single-socket A100 and require no special driver stacks. ...

An Ultima Online town NPC with a speech bubble driven by a local language model

When the peasant talks back: LLM NPCs in Ultima Online

TL;DR I run an Ultima Online shard on my homelab where the NPCs are driven by a local LLM instead of canned dialog trees. Each NPC rolls a persisted identity, remembers conversations with individual players across reboots, runs its own errands and cross-map journeys, and — the part I’m writing about today — strikes up ambient chatter with nearby NPCs on its own. The newest work extends all of that from townsfolk to language-speaking monsters: ogres, lizardmen, ratmen, gargoyles, daemons, and especially liches, who address each other like god-kings deigning to notice an insect. Inference is a local gemma-class model behind an in-cluster gateway, so it’s free and private, with the one tradeoff being cold-load latency. It’s single-shard hobby-scale and it absolutely shows the seams. I love it. ...

Power meter and heat-flow diagram for a homelab rack

Watts, BTUs, and the real cost of running a homelab 24/7

TL;DR A homelab feels free until you read the meter. After a year of running seven k3s nodes plus a pair of Mac Studios under whatever workload I felt like throwing at them, I sat down with a Kill-a-Watt and worked out what the cluster actually costs to keep on. Idle is genuinely cheap. Sustained LLM inference is not. The honest break-even against cloud inference is workload-shaped, and for my workloads, on-prem wins — but only because I run them often enough to amortize the wattage. The numbers below are mine; substitute your electricity rate to get yours. ...

A closed business laptop running headless as a homelab server node

The cheapest homelab node has a built-in UPS: a used business laptop

TL;DR Everyone reaches for a mini PC or a Pi for a homelab node. The thing nobody tells you: a used business laptop is a server with a built-in UPS, screen, and keyboard bolted on for free. A Dell Latitude 7400 — 8th-gen Core i5, 16 GB RAM, NVMe SSD — runs about $150 used, draws ~10 W with the lid shut, and when the power flickers it doesn’t even notice, because it’s running off its own battery. I run a couple as edge nodes. Here’s the case for it and the five-minute headless setup. ...

ComfyUI on Mac Studio: MPS-Accelerated Image Generation Behind k3s Ingress

TL;DR I deployed ComfyUI natively on my Mac Studio M3 Ultra using Apple’s MPS GPU backend, proxied it through k3s Traefik ingress with Authentik SSO, wired it into Open WebUI as the image generation backend (replacing $0.04/image Bedrock calls), and built an MCP server so AI agents can generate images programmatically. The whole pipeline is Ansible-managed and generates images for free on local hardware. Why native instead of containerized ComfyUI needs GPU access. On Linux, that’s straightforward — pass through the GPU via device plugins. On macOS, there’s no container runtime that exposes MPS (Metal Performance Shaders) to containers. Docker Desktop on Mac runs a Linux VM — no Metal, no MPS. ...

Hardening a Self-Hosted AI Agent: Multi-Stage Builds, NetworkPolicies, and Automated CVE Triage

TL;DR OpenClaw, my self-hosted AI trading agent, was running in a fat container with 46 Critical CVEs, no network restrictions, and no automated vulnerability scanning. I fixed all three: multi-stage Dockerfile dropped the CVE count to single digits, default-deny NetworkPolicies locked down traffic, and a daily CronJob triages Trivy scan results via local LLM and posts a digest to Slack. Total cost of the automated triage: $0/day. The problem with AI agent containers AI agent containers are uniquely bad from a security perspective. They need: ...

Dream Workers: Letting an AI Agent Improve Your Cluster While You Sleep

TL;DR I built an “Ops Dream Worker” — a Kubernetes CronJob that runs at 3 AM, inspects the cluster, identifies improvements, and files GitHub issues with specific fixes. It runs entirely on local models (Mac Studio M3 Ultra), costs $0 per run, and went through 240 A/B test iterations to optimize the prompts. The anti-hallucination patterns were harder to get right than the analysis itself. The idea I have a k3s cluster with ~40 deployed services. I maintain it solo. There’s always something that could be better — a deployment missing resource limits, a CronJob that’s been failing silently, an ingress without SSO protection, a container image with known CVEs. These improvements pile up because I’m usually focused on building features, not auditing infrastructure. ...