Homelab

Building a VPN Mesh for a Tech Collective

TL;DR I am designing a WireGuard VPN mesh to connect a small tech collective – a group of friends who each run their own infrastructure. The topology is hub-and-spoke with my k3s cluster as the hub, connecting 4+ remote sites over encrypted tunnels. Shared services include Jellyfin media federation, distributed CI/CD runners, LAN gaming, and centralized monitoring. The logging pipeline is privacy-first: all log filtering and anonymization happens at the edge (spoke side) before anything ships to the hub. This post covers the network design, the three-layer firewall architecture, the privacy model, and the phased rollout plan. ...

Homelab State of the Union: 42 Namespaces and Counting

TL;DR The homelab runs 42 Kubernetes namespaces across 7 nodes (3 control plane, 4 workers) on 4 Lenovo ThinkCentre M920q mini PCs running Proxmox VE. This post is the result of a full infrastructure audit — reconciling what’s actually running against what’s documented, catching version drift, and noting what’s been added, removed, or broken since the last check. Compute Four Lenovo ThinkCentre M920q nodes form the physical layer: Host CPU RAM NVMe Role pve1 i5-8500T 32GB 512GB 1 server VM + 1 agent VM pve2 i5-8500T 32GB 512GB 1 server VM + 1 agent VM pve3 i5-8500T 32GB 512GB 1 server VM + 1 agent VM pve4 i7-8700T 32GB 512GB 1 agent VM (GPU passthrough) The k3s cluster runs v1.34.4+k3s1 with embedded etcd for HA. All 7 nodes report Ready. Server VMs get 2 cores and 6GB each — just enough for etcd and the API server. Agent VMs are beefier: 6 cores and 22GB on pve1-3, 12 cores and 28GB on pve4. ...

CI/CD pipeline for blog deployment on k3s

This Blog Deploys Itself: Self-Hosted CI/CD on k3s with GitHub ARC

TL;DR The blog is deployed by GitHub Actions runners running inside the same k3s cluster it’s talking about. A push to main with content under hugo/ triggers a build, a two-pass S3 sync, and a CloudFront invalidation. A daily 06:00 UTC cron handles future-dated posts so I can commit a backlog and let them drip out on schedule. After every successful deploy, a Playwright job kicks off and scans the live site for broken links, visual regressions, and security header compliance. The whole thing runs on eight self-hosted amd64 runners managed by GitHub’s Actions Runner Controller (ARC) in the cluster. Not a single managed CI minute gets billed. ...

OpenClaw Multi-User: Privacy, Dual AI Backends, and Per-User Cost Tracking

TL;DR Multi-user AI chat with privacy guarantees, dual model providers (Anthropic direct API + AWS Bedrock via LiteLLM), and per-user cost tracking via Prometheus and Grafana. The admin cannot read other users’ conversations. Three family members authenticate via Google OAuth, each getting isolated chat sessions. Anthropic serves as the primary model provider with lower latency, and Bedrock via LiteLLM acts as a fallback. Per-user spend is tracked through LiteLLM’s Prometheus metrics without any surveillance of conversation content. This is a follow-up to the OpenClaw on k3s setup post. ...

Linkerd service mesh evaluation for homelab Kubernetes

Linkerd Service Mesh: Why I'm Not Deploying It Yet (But Have a Plan Ready)

TL;DR I spent time evaluating Linkerd — the CNCF graduated service mesh — for my homelab k3s cluster. The conclusion: it’s an impressive piece of engineering with genuinely useful features like automatic mTLS, post-quantum cryptography, and per-service observability. But for a cluster with ~20 workloads and a single operator, the operational overhead outweighs the benefits today. I’ve written a complete deployment plan so I can adopt it quickly when the cluster grows to the point where it makes sense. ...

OpenClaw on k3s: Replacing Open WebUI with a Lighter AI Gateway

TL;DR I replaced Open WebUI with OpenClaw – a lighter, WebSocket-based AI assistant gateway that installs from npm, supports multiple chat channels (web, Telegram, Discord, WhatsApp), and deploys on k3s as a single Deployment with a custom Docker image. The primary model provider is Anthropic’s direct API (Claude Sonnet 4.5), with LiteLLM/Bedrock as a fallback. The biggest deployment lesson: OpenClaw binds to loopback by default, which makes it invisible to Kubernetes Services and health probes. The fix is --bind lan, which requires a gateway token for authentication. ...

Five Projects in One Day: What AI Pair Programming Actually Looks Like

TL;DR On March 21, I shipped meaningful work across five repositories in a single day: a 13,674-line stock trading platform from scratch, a Harbor container registry replacing AWS ECR across 13 CI workflows, API key authentication and an HA proxy for digital signage, inventory sell signals for a trading card tracker, and an OpenClaw cost optimization that killed an idle token burn. Every commit was co-authored with Claude. This post breaks down the mechanics of how that actually works – the prompting patterns, the failure modes, the things I would not let the AI do, and the real throughput multiplier. ...

Building a TCG Price Tracker with Selenium and Kubernetes

TL;DR Cardboard is a TCG price tracker that monitors sealed product prices across 10 trading card games. It scrapes TCGPlayer and eBay using a three-tier strategy: pure API calls for bulk data, headless Selenium for product pages, and non-headless Selenium with a virtual display for sites that actively detect headless browsers. The scrapers run as Kubernetes Jobs on the same k3s cluster from Cluster Genesis. A Flask dashboard with Chart.js renders historical price data, profit/loss calculations, and portfolio tracking. All scraping is intentionally rate-limited to match normal human browsing patterns – the goal is polite data collection, not stress testing someone else’s infrastructure. ...

Home Assistant as the Data Hub for Digital Signage

TL;DR The digital signage system was pulling weather from OpenWeatherMap, calendar events from Google Calendar, and device status from MQTT – three separate API keys, three separate failure modes. Home Assistant already had all of this data. I built an HA proxy service that exposes weather, forecasts, calendar events, temperature sensors, and arbitrary entity queries through a single Flask API backed by the Home Assistant REST API. Five new endpoints replaced three external dependencies. I also added API key authentication with role-based access control, wrote 37 tests, fixed MQTT addressing after a VLAN migration, and fought through 6 CI/CD fixes to get the pipeline deploying on self-hosted ARC runners. ...

Why I Switched from GitHub Copilot to Claude Code Max

TL;DR GitHub Copilot is more capable than most people give it credit for. I used it heavily – not just for autocomplete, but for multi-file edits, chat-driven debugging, and workspace-aware refactoring. After a year of intensive Copilot usage and a month with Claude Code Max ($100/month for the Max plan with Opus), I moved my primary workflow to Claude Code for infrastructure and backend work. The reason is not that Copilot cannot do these things – it is that Claude Code is faster and I can hand it a task and let it run without babysitting. Copilot still wins for inline code completion in the editor. Claude Code wins when I want to describe a goal and walk away while it executes. ...