Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE Two Mac Studios bridged by Thunderbolt 5 running a 1T parameter MoE

Running a 1T-parameter MoE locally on two Mac Studios over Thunderbolt 5

TL;DR Two M3 Ultra Mac Studios — 256GB unified memory each — connected by a Thunderbolt 5 cable can run mixture-of-experts models in the trillion-parameter range that no single 256GB box can fit. The hot path stays on Box 1; Box 2 hosts heavier experts and gets called via a local nginx proxy on port 11436. Real-world power draw is nowhere near the spec sheet. Some models still don’t fit even with two boxes (Kimi K2.6 native INT4), and that’s a genuinely useful constraint to know. ...

May 6, 2026 · 6 min · zolty
Domain interviewer bot architecture Domain interviewer bot architecture

AI Agents Work Better When They Actually Know How You Operate

TL;DR AI agents fail when they don’t know what you know. I built a Slack bot that conducts structured 5-layer interviews to extract tacit knowledge — operating rhythms, decision criteria, dependencies, friction points, leverage opportunities — and generates soul.md, user.md, and heartbeat.md config files for provisioning agents. The interview surfaces ~30% more actionable context than documentation alone. Full source code below. The Problem Nobody’s Talking About Nate B. Jones has a video that nails the core issue with AI agents: they fail because they lack tacit knowledge. Not the stuff in your docs — the stuff in your head. The 20-year veteran who just knows that the staging deploy takes longer on Thursdays because the batch job runs. The designer who can feel when a color palette is wrong without being able to articulate why. ...

April 16, 2026 · 11 min · zolty
GLM-5.1 benchmark on Mac Studio GLM-5.1 benchmark on Mac Studio

Running GLM-5.1 (744B) Locally on a Mac Studio: Benchmark Results

TL;DR I loaded Z.ai’s GLM-5.1 — a 744B parameter MoE model with 40B active parameters — onto a Mac Studio M3 Ultra with 256GB unified memory using a 2-bit quantized GGUF via llama.cpp. It runs at 5.8 tok/s with a 120-second time to first token. The financial analysis quality is genuinely impressive, but it eats 222GB of the 256GB available, leaving room for literally nothing else. It’s a “clear the schedule” model, not an always-on one. ...

April 13, 2026 · 8 min · zolty
Securing Jellyfin on the internet Securing Jellyfin on the internet

Securing Jellyfin when it's exposed to the internet

TL;DR Someone asked me on Reddit for a comprehensive guide to securing a public-facing Jellyfin instance, so here it is. The short answer I gave was: fail2ban, automate patching, implement OAuth, and download an IP block list. This post expands all four into actionable steps and adds a fifth option — IP whitelisting with a DDNS-aware Python cron job — plus the honest answer that a VPN eliminates most of this complexity entirely. ...

March 28, 2026 · 10 min · zolty
Jellyfin hardware stress tester Jellyfin hardware stress tester

Stress Testing GPU Transcoding in Kubernetes with JF_hw_stress

TL;DR JF_hw_stress is a headless transcoding stress tester that answers one question: how many concurrent transcode streams can your GPU actually handle before quality degrades? It runs escalating FFmpeg transcodes against real media files using VAAPI hardware acceleration, measures FPS ratios, and outputs a JSON report. I run it as a Kubernetes Job on the same k3s cluster from Cluster Genesis, scheduled exclusively on the GPU node (Intel UHD 630). The job auto-deletes after 10 minutes so it does not accumulate stale pods. ...

March 27, 2026 · 6 min · zolty
PiKey Bluetooth keyboard emulator PiKey Bluetooth keyboard emulator

PiKey: A Raspberry Pi That Pretends to Be Your Keyboard

TL;DR PiKey is a Raspberry Pi project that spoofs a Logitech K380 Bluetooth keyboard and mouse. It jiggles the mouse to prevent idle detection and auto-types LLM-generated text to simulate human activity. The device appears as a standard Bluetooth HID peripheral – no drivers or software needed on the target machine. Three full implementations exist: Python (primary), Rust (static binary), and C (minimal dependencies). The whole thing was inspired by a Reddit thread on r/overemployed where someone asked for exactly this device. ...

March 27, 2026 · 6 min · zolty
OpenClaw multi-user AI gateway OpenClaw multi-user AI gateway

OpenClaw Multi-User: Privacy, Dual AI Backends, and Per-User Cost Tracking

TL;DR Multi-user AI chat with privacy guarantees, dual model providers (Anthropic direct API + AWS Bedrock via LiteLLM), and per-user cost tracking via Prometheus and Grafana. The admin cannot read other users’ conversations. Three family members authenticate via Google OAuth, each getting isolated chat sessions. Anthropic serves as the primary model provider with lower latency, and Bedrock via LiteLLM acts as a fallback. Per-user spend is tracked through LiteLLM’s Prometheus metrics without any surveillance of conversation content. This is a follow-up to the OpenClaw on k3s setup post. ...

March 25, 2026 · 13 min · zolty
OpenClaw AI gateway on k3s OpenClaw AI gateway on k3s

OpenClaw on k3s: Replacing Open WebUI with a Lighter AI Gateway

TL;DR I replaced Open WebUI with OpenClaw – a lighter, WebSocket-based AI assistant gateway that installs from npm, supports multiple chat channels (web, Telegram, Discord, WhatsApp), and deploys on k3s as a single Deployment with a custom Docker image. The primary model provider is Anthropic’s direct API (Claude Sonnet 4.5), with LiteLLM/Bedrock as a fallback. The biggest deployment lesson: OpenClaw binds to loopback by default, which makes it invisible to Kubernetes Services and health probes. The fix is --bind lan, which requires a gateway token for authentication. ...

March 23, 2026 · 13 min · zolty
TCG price tracker TCG price tracker

Building a TCG Price Tracker with Selenium and Kubernetes

TL;DR Cardboard is a TCG price tracker that monitors sealed product prices across 10 trading card games. It scrapes TCGPlayer and eBay using a three-tier strategy: pure API calls for bulk data, headless Selenium for product pages, and non-headless Selenium with a virtual display for sites that actively detect headless browsers. The scrapers run as Kubernetes Jobs on the same k3s cluster from Cluster Genesis. A Flask dashboard with Chart.js renders historical price data, profit/loss calculations, and portfolio tracking. All scraping is intentionally rate-limited to match normal human browsing patterns – the goal is polite data collection, not stress testing someone else’s infrastructure. ...

March 22, 2026 · 16 min · zolty
Digital signage HA proxy Digital signage HA proxy

Home Assistant as the Data Hub for Digital Signage

TL;DR The digital signage system was pulling weather from OpenWeatherMap, calendar events from Google Calendar, and device status from MQTT – three separate API keys, three separate failure modes. Home Assistant already had all of this data. I built an HA proxy service that exposes weather, forecasts, calendar events, temperature sensors, and arbitrary entity queries through a single Flask API backed by the Home Assistant REST API. Five new endpoints replaced three external dependencies. I also added API key authentication with role-based access control, wrote 37 tests, fixed MQTT addressing after a VLAN migration, and fought through 6 CI/CD fixes to get the pipeline deploying on self-hosted ARC runners. ...

March 22, 2026 · 5 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.