Harbor proxy cache fronting upstream registries Harbor proxy cache fronting upstream registries

Harbor as a proxy cache for every upstream registry — killing rate limits in a homelab

TL;DR Every node in my k3s cluster used to pull images directly from docker.io, ghcr.io, lscr.io, and quay.io. That meant Docker Hub rate limits, occasional 5xx storms from ghcr, and a hard outage when quay.io went sideways for a few hours. I put Harbor in front of all of them as a proxy cache, pointed containerd at Harbor, and the registry-related noise in my cluster effectively went to zero. Image pulls also got faster — 10GbE LAN beats every public CDN I’ve measured against. ...

May 1, 2026 · 4 min · zolty
GitLab CE on k3s with S3 backup arrows GitLab CE on k3s with S3 backup arrows

Migrating from GitHub to self-hosted GitLab CE — and rebuilding it from S3

TL;DR I moved every private homelab repo off GitHub onto a self-hosted GitLab CE 18.10 instance running on my k3s cluster. GitHub stays as a read-only mirror plus the break-glass k3s_bootstrap repo. Two weeks later I accidentally blkdiscard’d the GitLab volume and rebuilt the entire instance from an S3 backup. It worked, but the boring parts — runner re-registration, group tokens, container-registry pull secrets — were the real cost. Why bother GitHub was fine. GitHub Actions was fine. The thing that pushed me over was billing math plus blast radius: ...

April 29, 2026 · 5 min · zolty
Agentic Claude processes reporting back from long-running OpenClaw workers Agentic Claude processes reporting back from long-running OpenClaw workers

Giving Claude the ability to talk back: agentic long-running processes in OpenClaw

Heads up: this post mentions Claude. If you want to try it, I've got a referral link — it gives us both a bit of extra credit, no pressure: claude.ai via my referral. TL;DR Most AI tooling still treats an LLM like a search bar — you prompt, it answers, the loop ends. Useful, but not what I wanted. For my homelab’s ops + trading intelligence platform (OpenClaw), I needed agents that could run for hours, do real work against a real cluster, and then tap me on the shoulder when they found something I should see. Claude turned out to be the model I kept coming back to for the “thinking” layer — it’s both comfortable with long tool-use chains and happy to write structured output a human won’t need to decode. This is a tour of how I’ve actually wired that up: k3s CronJobs doing the heavy lifting, LiteLLM as the routing layer, Slack as the interrupt bus, and named cat-bot personas so I can tell at a glance who’s knocking. ...

April 21, 2026 · 11 min · zolty
Coordinating parallel Claude Code sessions Coordinating parallel Claude Code sessions

Three Claude tabs kept clobbering each other. So I built a guard.

TL;DR I run 3-5 parallel Claude Code sessions against the same homelab. One tab mid-refactor, one tab doing docs, one tab chasing a bug. They don’t know about each other, so every so often one tab “tidies up” a file another tab is actively editing — and Claude, being a dutiful little overwriter, just clobbers the work. I built a small Go binary that hooks into SessionStart / SessionEnd / PreToolUse, tracks file claims on disk, and injects a warning straight into the LLM’s context window when it’s about to step on another session’s toes. Optional Slack mirror so I can watch the timeline from my phone. MIT, single binary, no runtime deps. Repo: github.com/zolty-mat/claude-session-guard. ...

April 18, 2026 · 10 min · zolty
Domain interviewer bot architecture Domain interviewer bot architecture

AI Agents Work Better When They Actually Know How You Operate

TL;DR AI agents fail when they don’t know what you know. I built a Slack bot that conducts structured 5-layer interviews to extract tacit knowledge — operating rhythms, decision criteria, dependencies, friction points, leverage opportunities — and generates soul.md, user.md, and heartbeat.md config files for provisioning agents. The interview surfaces ~30% more actionable context than documentation alone. Full source code below. The Problem Nobody’s Talking About Nate B. Jones has a video that nails the core issue with AI agents: they fail because they lack tacit knowledge. Not the stuff in your docs — the stuff in your head. The 20-year veteran who just knows that the staging deploy takes longer on Thursdays because the batch job runs. The designer who can feel when a color palette is wrong without being able to articulate why. ...

April 16, 2026 · 11 min · zolty
Auto-documenting homelab architecture diagrams Auto-documenting homelab architecture diagrams

Auto-documenting a homelab: the quest for free architecture diagrams

TL;DR I spent a full day trying to automatically generate professional architecture diagrams for a 7-node k3s homelab. Figma’s MCP integration was perfect but requires a paid subscription. I tried Excalidraw (JSON generation + Kroki rendering), Mermaid, and finally landed on raw SVG generation in Python. The result is 27 diagrams with tech icons, drop shadows, and curved arrows — but the process is more manual than I’d like. I’m curious if anyone else has found a truly automated, free solution. ...

April 14, 2026 · 7 min · zolty
Self-hosted AI setup with OpenClaw and Ollama Self-hosted AI setup with OpenClaw and Ollama

Self-Hosted AI on a 24GB GPU: OpenClaw + Ollama Setup Guide for Windows

TL;DR You have a 24GB VRAM GPU. You want a private, self-hosted AI assistant that rivals ChatGPT – no subscriptions, no data leaving your machine. This guide walks you through setting up Ollama (local model runtime) and OpenClaw (AI gateway with a web UI) on Windows using Docker Desktop. But the real value here is the model recommendations. I ran 5,475 evaluations across 21 prompt variants and 6 models on real trading data. The results contradicted almost everything the community recommends. Finance-tuned models performed worse than a coin flip. Chain-of-thought reasoning models were anti-patterns. The winners were general-purpose MoE (Mixture-of-Experts) models that nobody talks about for specialized tasks. ...

April 14, 2026 · 21 min · zolty
GLM-5.1 benchmark on Mac Studio GLM-5.1 benchmark on Mac Studio

Running GLM-5.1 (744B) Locally on a Mac Studio: Benchmark Results

TL;DR I loaded Z.ai’s GLM-5.1 — a 744B parameter MoE model with 40B active parameters — onto a Mac Studio M3 Ultra with 256GB unified memory using a 2-bit quantized GGUF via llama.cpp. It runs at 5.8 tok/s with a 120-second time to first token. The financial analysis quality is genuinely impressive, but it eats 222GB of the 256GB available, leaving room for literally nothing else. It’s a “clear the schedule” model, not an always-on one. ...

April 13, 2026 · 8 min · zolty
ComfyUI on Mac Studio with k3s ingress ComfyUI on Mac Studio with k3s ingress

ComfyUI on Mac Studio: MPS-Accelerated Image Generation Behind k3s Ingress

TL;DR I deployed ComfyUI natively on my Mac Studio M3 Ultra using Apple’s MPS GPU backend, proxied it through k3s Traefik ingress with Authentik SSO, wired it into Open WebUI as the image generation backend (replacing $0.04/image Bedrock calls), and built an MCP server so AI agents can generate images programmatically. The whole pipeline is Ansible-managed and generates images for free on local hardware. Why native instead of containerized ComfyUI needs GPU access. On Linux, that’s straightforward — pass through the GPU via device plugins. On macOS, there’s no container runtime that exposes MPS (Metal Performance Shaders) to containers. Docker Desktop on Mac runs a Linux VM — no Metal, no MPS. ...

April 11, 2026 · 6 min · zolty
Mac Studio observability stack Mac Studio observability stack

Monitoring a Mac Studio as a First-Class Cluster Citizen: Prometheus, Loki, and Custom Ollama Exporters

TL;DR My Mac Studio M3 Ultra runs Ollama with 70B+ models but isn’t a k3s node. I needed it to show up in Grafana next to the cluster workloads. The solution: node_exporter for system metrics, a Go reverse proxy for per-model inference metrics, a custom Python exporter for model inventory and VRAM tracking, and Grafana Alloy for shipping logs to Loki. All four services managed by Ansible, all metrics scraped by the cluster’s Prometheus. ...

April 10, 2026 · 8 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.