Four-rung ladder showing supervised, monitored, trusted, full autonomy stages Four-rung ladder showing supervised, monitored, trusted, full autonomy stages

The agent autonomy trust ladder: supervised → monitored → trusted → full

TL;DR I run a growing fleet of autonomous agents — homelab ops, trading research, content generation. Most blow up the first few times they try anything new. I needed a way to decide what an agent is allowed to do without asking me, and what still requires a human checkpoint. The answer is a four-rung trust ladder — supervised, monitored, trusted, full autonomy. Agents earn rungs through track record, not promises. Demotions are possible and routine. The framework took the question “should this agent be allowed to do X” out of my head every single time and turned it into a policy I can apply consistently. ...

May 11, 2026 · 6 min · zolty
Multiple Claude sessions posting to a shared Mattermost channel Multiple Claude sessions posting to a shared Mattermost channel

Coordinating 3-5 parallel Claude sessions through a shared Mattermost channel

TL;DR I run 3-5 Claude Code sessions in parallel at staggered cadences. They coordinate through a shared #mat-claude-sessions Mattermost channel plus a small coordination board file. Each session announces what it’s about to touch, claims it, and announces when it’s done. Conflicts are rare; throughput is dramatically higher than running one session at a time and waiting. Why parallel A single Claude Code session running a long task — refactor across a few repos, work through a debugging session, draft a blog post — is mostly me waiting. The model is fast but tasks are bounded by my decisions, my reviews, and my edits. If I’m waiting on Session A to finish a build, Session B can be drafting something unrelated. Session C can be running a slow eval. The bottleneck stops being the model and becomes my own attention rotation. ...

May 9, 2026 · 4 min · zolty
Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE

Running a 1T-parameter MoE locally on two Mac Studios over Thunderbolt 5

TL;DR Two M3 Ultra Mac Studios — 256GB unified memory each — connected by a Thunderbolt 5 cable can run mixture-of-experts models in the trillion-parameter range that no single 256GB box can fit. The hot path stays on Box 1; Box 2 hosts heavier experts and gets called via a local nginx proxy on port 11436. Real-world power draw is nowhere near the spec sheet. Some models still don’t fit even with two boxes (Kimi K2.6 native INT4), and that’s a genuinely useful constraint to know. ...

May 6, 2026 · 6 min · zolty
LLM evaluator with masked headlines and dates LLM evaluator with masked headlines and dates

Blind Oracle: stripping dates, headlines, and tickers before trusting an LLM trading evaluator

TL;DR I run an LLM-driven trading hypothesis engine. For a while, every result that came back looked too good — Sharpe ratios above 5, win rates above 70%, all on out-of-sample windows. They were lies. The model was reading dates, headlines, and tickers in the prompt and pattern-matching against its training data, which extends well past my “out-of-sample” cutoff. The fix was a masking layer I now call Blind Oracle: strip every leak before evaluation, run the trigger before the eval, gate promotion on out-of-sample Sharpe with the masking enforced. After it shipped, the inflated numbers collapsed back to honest reality. Some hypotheses survived; most didn’t. That’s exactly what I needed to know. ...

May 4, 2026 · 5 min · zolty
Agentic Claude processes reporting back from long-running OpenClaw workers Agentic Claude processes reporting back from long-running OpenClaw workers

Giving Claude the ability to talk back: agentic long-running processes in OpenClaw

Heads up: this post mentions Claude. If you want to try it, I've got a referral link — it gives us both a bit of extra credit, no pressure: claude.ai via my referral. TL;DR Most AI tooling still treats an LLM like a search bar — you prompt, it answers, the loop ends. Useful, but not what I wanted. For my homelab’s ops + trading intelligence platform (OpenClaw), I needed agents that could run for hours, do real work against a real cluster, and then tap me on the shoulder when they found something I should see. Claude turned out to be the model I kept coming back to for the “thinking” layer — it’s both comfortable with long tool-use chains and happy to write structured output a human won’t need to decode. This is a tour of how I’ve actually wired that up: k3s CronJobs doing the heavy lifting, LiteLLM as the routing layer, Slack as the interrupt bus, and named cat-bot personas so I can tell at a glance who’s knocking. ...

April 21, 2026 · 11 min · zolty
Coordinating parallel Claude Code sessions Coordinating parallel Claude Code sessions

Three Claude tabs kept clobbering each other. So I built a guard.

TL;DR I run 3-5 parallel Claude Code sessions against the same homelab. One tab mid-refactor, one tab doing docs, one tab chasing a bug. They don’t know about each other, so every so often one tab “tidies up” a file another tab is actively editing — and Claude, being a dutiful little overwriter, just clobbers the work. I built a small Go binary that hooks into SessionStart / SessionEnd / PreToolUse, tracks file claims on disk, and injects a warning straight into the LLM’s context window when it’s about to step on another session’s toes. Optional Slack mirror so I can watch the timeline from my phone. MIT, single binary, no runtime deps. Repo: github.com/zolty-mat/claude-session-guard. ...

April 18, 2026 · 10 min · zolty
Self-hosted AI setup with OpenClaw and Ollama Self-hosted AI setup with OpenClaw and Ollama

Self-Hosted AI on a 24GB GPU: OpenClaw + Ollama Setup Guide for Windows

TL;DR You have a 24GB VRAM GPU. You want a private, self-hosted AI assistant that rivals ChatGPT – no subscriptions, no data leaving your machine. This guide walks you through setting up Ollama (local model runtime) and OpenClaw (AI gateway with a web UI) on Windows using Docker Desktop. But the real value here is the model recommendations. I ran 5,475 evaluations across 21 prompt variants and 6 models on real trading data. The results contradicted almost everything the community recommends. Finance-tuned models performed worse than a coin flip. Chain-of-thought reasoning models were anti-patterns. The winners were general-purpose MoE (Mixture-of-Experts) models that nobody talks about for specialized tasks. ...

April 14, 2026 · 21 min · zolty
GLM-5.1 benchmark on Mac Studio GLM-5.1 benchmark on Mac Studio

Running GLM-5.1 (744B) Locally on a Mac Studio: Benchmark Results

TL;DR I loaded Z.ai’s GLM-5.1 — a 744B parameter MoE model with 40B active parameters — onto a Mac Studio M3 Ultra with 256GB unified memory using a 2-bit quantized GGUF via llama.cpp. It runs at 5.8 tok/s with a 120-second time to first token. The financial analysis quality is genuinely impressive, but it eats 222GB of the 256GB available, leaving room for literally nothing else. It’s a “clear the schedule” model, not an always-on one. ...

April 13, 2026 · 8 min · zolty
ComfyUI on Mac Studio with k3s ingress ComfyUI on Mac Studio with k3s ingress

ComfyUI on Mac Studio: MPS-Accelerated Image Generation Behind k3s Ingress

TL;DR I deployed ComfyUI natively on my Mac Studio M3 Ultra using Apple’s MPS GPU backend, proxied it through k3s Traefik ingress with Authentik SSO, wired it into Open WebUI as the image generation backend (replacing $0.04/image Bedrock calls), and built an MCP server so AI agents can generate images programmatically. The whole pipeline is Ansible-managed and generates images for free on local hardware. Why native instead of containerized ComfyUI needs GPU access. On Linux, that’s straightforward — pass through the GPU via device plugins. On macOS, there’s no container runtime that exposes MPS (Metal Performance Shaders) to containers. Docker Desktop on Mac runs a Linux VM — no Metal, no MPS. ...

April 11, 2026 · 6 min · zolty
Hardening OpenClaw container security Hardening OpenClaw container security

Hardening a Self-Hosted AI Agent: Multi-Stage Builds, NetworkPolicies, and Automated CVE Triage

TL;DR OpenClaw, my self-hosted AI trading agent, was running in a fat container with 46 Critical CVEs, no network restrictions, and no automated vulnerability scanning. I fixed all three: multi-stage Dockerfile dropped the CVE count to single digits, default-deny NetworkPolicies locked down traffic, and a daily CronJob triages Trivy scan results via local LLM and posts a digest to Slack. Total cost of the automated triage: $0/day. The problem with AI agent containers AI agent containers are uniquely bad from a security perspective. They need: ...

April 9, 2026 · 7 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.