Jellyfin PostgreSQL database provider architecture Jellyfin PostgreSQL database provider architecture

Forking Jellyfin: A PostgreSQL Database Provider in .NET 10

TL;DR Jellyfin stores everything in SQLite. Metadata, users, activity logs, authentication — all of it lives in .db files that lock under concurrent access. To run multiple replicas, we need a real network-accessible database. This post covers Phase 1 of the HA conversion: forking Jellyfin, designing a pluggable database provider interface, implementing it for PostgreSQL with Npgsql, generating EF Core migrations, writing integration tests with Testcontainers, and building a custom Docker image. ...

March 8, 2026 · 8 min · zolty
Multi-model AI planning workflow diagram Multi-model AI planning workflow diagram

Multi-Model Planning: The Same Pattern That Shipped dnd-multi

TL;DR The Jellyfin HA conversion touches a .NET 10 codebase, Entity Framework Core migrations, Kubernetes manifests, Terraform infrastructure, PostgreSQL operations, and FFmpeg transcoding pipelines. No single AI model understands all of this equally well. So I used four of them — the same multi-model planning pattern that shipped dnd-multi in a single day and that I documented in the LLM GitHub PR workflow. This post covers how I adapted that pattern for infrastructure work, what each model caught, and why planning is where all the human time should go. ...

March 7, 2026 · 7 min · zolty
Jellyfin single-instance architecture diagram Jellyfin single-instance architecture diagram

Why Jellyfin Can't Scale (And What We're Going to Do About It)

TL;DR Jellyfin is a fantastic open-source media server. It is also, architecturally, a single-process application that assumes it’s the only instance running. SQLite as the database. Eleven ConcurrentDictionary caches holding sessions, users, devices, and task queues in memory. A file-based config directory that gets written to at runtime. None of this survives a second pod. This is the first post in a seven-part series documenting how I converted Jellyfin into a highly available, multi-replica deployment on my home k3s cluster. The project spans two repositories, four phases, ~20 GitHub Issues executed by AI agents, and a live failover demo where I killed a pod and the service continued with zero downtime — users on the surviving replica never saw an interruption. ...

March 6, 2026 · 9 min · zolty
AI-driven Kubernetes incident response — seven alerts resolved AI-driven Kubernetes incident response — seven alerts resolved

Seven Alerts, Three Bugs, One AI Debug Session: A Kubernetes Incident Report

TL;DR A routine cluster health check surfaced seven simultaneous issues. Most were transient — Longhorn self-healed its replica fault, Prometheus recovered behind it, a stale manually-created Job was deleted in one command, and a liveness probe blip fixed itself. The real work was dnd-backend, which had been in CrashLoopBackOff and turned out to contain three separate bugs layered on top of each other. The AI identified all three during a single debugging session, authored the fixes across three PRs, and the service came up 1/1 Running with all 18 database tables created on the first boot after the final merge. ...

March 4, 2026 · 8 min · zolty
Self-hosted AI chat deployment Self-hosted AI chat deployment

Self-Hosted AI Chat: Open WebUI, LiteLLM, and AWS Bedrock on k3s

TL;DR I deployed a private, self-hosted ChatGPT alternative on the homelab k3s cluster. Open WebUI provides a polished chat interface. LiteLLM acts as a proxy that translates the OpenAI API into AWS Bedrock’s Converse API. Four models are available: Claude Sonnet 4, Claude Haiku 4.5, Amazon Nova Micro, and Amazon Nova Lite. Authentication is handled by the existing OAuth2 Proxy – no additional SSO configuration needed. The whole stack runs in three pods consuming under 500MB of RAM, and the only ongoing cost is per-request Bedrock pricing. No API keys from OpenAI or Anthropic required. ...

March 4, 2026 · 8 min · zolty
AI failure patterns and guardrails AI failure patterns and guardrails

When the AI Breaks Production: Failure Patterns, Guardrails, and Measuring What Works

TL;DR AI tools have caused multiple production incidents in this cluster. The AI alert responder agent alone generated 14 documented failure patterns before it became reliable. A security scanner deployed by AI applied restricted PodSecurity labels to every namespace, silently blocking pod creation for half the applications in the cluster. The service selector trap – where AI routes 50% of requests to PostgreSQL instead of the application – appeared in 4 separate incidents before guardrails stopped it. This post catalogs the failure patterns, the five-layer guardrail architecture built to prevent them, and an honest assessment of what still goes wrong. ...

March 2, 2026 · 14 min · zolty
AI Dungeon Master platform architecture diagram AI Dungeon Master platform architecture diagram

Building an AI Dungeon Master: Full-Stack D&D Platform on k3s

TL;DR I’m building a multiplayer D&D platform where an AI powered by AWS Bedrock Claude runs the game. Players connect via a Next.js web app or Discord. A 5-tier lore context system gives the AI persistent memory across sessions. A background world simulation engine tracks NPC positions, inventory, faction standings, and in-game time so the AI can focus on storytelling instead of bookkeeping. The foundation is fully deployed on my home k3s cluster. The current work is turning a working tech demo into a game people actually want to sit down and play. ...

March 2, 2026 · 14 min · zolty

Reference: k3s Homelab — AI Lessons Learned

Context: This is the live docs/ai-lessons.md from the home k3s cluster repository, referenced extensively across posts on this blog — starting with AI Memory System and GitHub Copilot Setup Guide. Every entry exists because its absence caused a production incident. Personal identifiers and internal domains have been replaced with generic placeholders. Updated: 2026-03-03 Rules discovered through production breakage. Each entry prevents recurrence of a specific failure. Update this file whenever a new non-obvious failure pattern is discovered. ...

March 2, 2026 · 81 min · zolty
GitHub Copilot setup guide with AI skills and memory GitHub Copilot setup guide with AI skills and memory

Getting Started with GitHub Copilot: What Actually Works

TL;DR A $20/month GitHub Copilot subscription gives you Claude Sonnet 4.6, GPT-4o, and Gemini inside VS Code. Out of the box it’s useful. With a proper instruction setup — a copilot-instructions.md file, path-scoped rules, and skill documents — it becomes something you actually rely on. Most of the posts on this blog were built with this toolchain, mostly in the context of my k3s cluster, but the patterns apply anywhere. This is how I have it set up. ...

March 1, 2026 · 12 min · zolty
AI memory system architecture AI memory system architecture

Building an AI Memory System: From Blank Slate to 482 Lines of Hard-Won Knowledge

TL;DR The .github/copilot-instructions.md file started as 10 lines of project description and grew into a 99-line “operating system” for AI assistants. Then it split: failure patterns moved into docs/ai-lessons.md (now 482 lines across 20+ categories), and file-type-specific rules moved into .github/instructions/ with applyTo glob patterns. The same template structure was standardized across 5 repositories. This post traces the three generations of AI instruction architecture and shows how every production incident permanently improves AI reliability. ...

February 26, 2026 · 11 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.