Re-tuning an AI coding agent for a new model release Re-tuning an AI coding agent for a new model release

Re-tuning my Claude Code setup for a new Opus model

TL;DR A new Opus model shipped, so I sat down to re-tune the agent harness I drive it with — the CLAUDE.md files, skills, hooks, and settings that shape every session. The surprising part: the most valuable changes weren’t trimming prompts for the smarter model. They were wiring the agent into infrastructure I already run — offloading bulk work to a local LLM (≈$0), a live homelab statusline, session tracing for an “action audit,” and a goal-drift monitor that uses the local model as judge. I also learned not to trust the new model’s own suggestions about what to cut. It wanted to delete load-bearing guardrails. ...

May 28, 2026 · 9 min · zolty
A seam between homelab and cloud services, with arrows for the few things still in cloud A seam between homelab and cloud services, with arrows for the few things still in cloud

The seam — what I deliberately left in the cloud and why

TL;DR This is the counterpart to the manifesto and the DR drill. After moving a chunk of the stack home, a list of things deliberately stayed rented: Route53, ACM, S3, AWS KMS, the Anthropic API for Claude, Bedrock for Amazon-only models, a transactional email sender, and one repo on GitHub. Each of them earns its place by being either the long pole on availability or the dependency that has to outlive the cluster. Self-hosting maximalism is a trap; the seam is the feature. ...

May 26, 2026 · 8 min · zolty
Scheduled disaster recovery rebuild timeline on a homelab cluster Scheduled disaster recovery rebuild timeline on a homelab cluster

The Saturday DR drill — burning the cluster down on purpose

TL;DR Three weeks after accidentally wiping GitLab with a misdirected blkdiscard and rebuilding from S3, I scheduled a deliberate drill: wipe GitLab, Vault, Harbor’s proxy cache, Authentik’s database, and one Longhorn volume on a Saturday morning, then rebuild everything from Terraform + S3 with a stopwatch running. Total drill time: 4 hours 22 minutes, end to end. About 90 minutes of that was actual rebuild work; the rest was discovering pieces of state I’d accidentally left out of the IaC. ...

May 23, 2026 · 9 min · zolty
Migration arrows from managed cloud services to a self-hosted cluster Migration arrows from managed cloud services to a self-hosted cluster

From managed to owned — the case for self-hosting in 2026

TL;DR A year ago my stack was the usual mix — GitHub for code, ECR for images, GitHub Actions for CI, Docker Hub for upstreams, Route53 + S3 + CloudFront for the blog. Most of that’s still where it should be. About a third of it isn’t. This post is the retrospective on what came home, what stayed rented, and the rule of thumb I now use when deciding which side of the line a new service goes on. The short version: self-host the things you operate; rent the things you’d never have time to operate. ...

May 20, 2026 · 7 min · zolty
Vault HA cluster fronted by Authentik with KMS auto-unseal Vault HA cluster fronted by Authentik with KMS auto-unseal

HashiCorp Vault behind Authentik — secrets that survive an auditor

TL;DR I had Authentik handling human auth and kubeseal handling cluster secrets, which left a gap: anything that needed a real secret at runtime — API tokens, database passwords, Bedrock keys — was one kubectl get secret away from being readable in plaintext. I deployed HashiCorp Vault as a 3-node HA cluster on k3s, auto-unsealed via AWS KMS, with Authentik OIDC for human SSO and the Kubernetes auth method for workloads. Apps get their secrets injected by a sidecar; no app code touches a k8s Secret object anymore. The migration took a weekend and removed an entire category of “what if this got read” worry I’d been ignoring. ...

May 17, 2026 · 8 min · zolty
Power meter and heat-flow diagram for a homelab rack Power meter and heat-flow diagram for a homelab rack

Watts, BTUs, and the real cost of running a homelab 24/7

TL;DR A homelab feels free until you read the meter. After a year of running seven k3s nodes plus a pair of Mac Studios under whatever workload I felt like throwing at them, I sat down with a Kill-a-Watt and worked out what the cluster actually costs to keep on. Idle is genuinely cheap. Sustained LLM inference is not. The honest break-even against cloud inference is workload-shaped, and for my workloads, on-prem wins — but only because I run them often enough to amortize the wattage. The numbers below are mine; substitute your electricity rate to get yours. ...

May 14, 2026 · 7 min · zolty
Four-rung ladder showing supervised, monitored, trusted, full autonomy stages Four-rung ladder showing supervised, monitored, trusted, full autonomy stages

The agent autonomy trust ladder: supervised → monitored → trusted → full

TL;DR I run a growing fleet of autonomous agents — homelab ops, trading research, content generation. Most blow up the first few times they try anything new. I needed a way to decide what an agent is allowed to do without asking me, and what still requires a human checkpoint. The answer is a four-rung trust ladder — supervised, monitored, trusted, full autonomy. Agents earn rungs through track record, not promises. Demotions are possible and routine. The framework took the question “should this agent be allowed to do X” out of my head every single time and turned it into a policy I can apply consistently. ...

May 11, 2026 · 6 min · zolty
Multiple Claude sessions posting to a shared Mattermost channel Multiple Claude sessions posting to a shared Mattermost channel

Coordinating 3-5 parallel Claude sessions through a shared Mattermost channel

TL;DR I run 3-5 Claude Code sessions in parallel at staggered cadences. They coordinate through a shared #mat-claude-sessions Mattermost channel plus a small coordination board file. Each session announces what it’s about to touch, claims it, and announces when it’s done. Conflicts are rare; throughput is dramatically higher than running one session at a time and waiting. Why parallel A single Claude Code session running a long task — refactor across a few repos, work through a debugging session, draft a blog post — is mostly me waiting. The model is fast but tasks are bounded by my decisions, my reviews, and my edits. If I’m waiting on Session A to finish a build, Session B can be drafting something unrelated. Session C can be running a slow eval. The bottleneck stops being the model and becomes my own attention rotation. ...

May 9, 2026 · 4 min · zolty
Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE

Running a 1T-parameter MoE locally on two Mac Studios over Thunderbolt 5

TL;DR Two M3 Ultra Mac Studios — 256GB unified memory each — connected by a Thunderbolt 5 cable can run mixture-of-experts models in the trillion-parameter range that no single 256GB box can fit. The hot path stays on Box 1; Box 2 hosts heavier experts and gets called via a local nginx proxy on port 11436. Real-world power draw is nowhere near the spec sheet. Some models still don’t fit even with two boxes (Kimi K2.6 native INT4), and that’s a genuinely useful constraint to know. ...

May 6, 2026 · 6 min · zolty
LLM evaluator with masked headlines and dates LLM evaluator with masked headlines and dates

Blind Oracle: stripping dates, headlines, and tickers before trusting an LLM trading evaluator

TL;DR I run an LLM-driven trading hypothesis engine. For a while, every result that came back looked too good — Sharpe ratios above 5, win rates above 70%, all on out-of-sample windows. They were lies. The model was reading dates, headlines, and tickers in the prompt and pattern-matching against its training data, which extends well past my “out-of-sample” cutoff. The fix was a masking layer I now call Blind Oracle: strip every leak before evaluation, run the trigger before the eval, gate promotion on out-of-sample Sharpe with the masking enforced. After it shipped, the inflated numbers collapsed back to honest reality. Some hypotheses survived; most didn’t. That’s exactly what I needed to know. ...

May 4, 2026 · 5 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.