Self-hosted AI chat deployment

Self-Hosted AI Chat: Open WebUI, LiteLLM, and AWS Bedrock on k3s

TL;DR I deployed a private, self-hosted ChatGPT alternative on the homelab k3s cluster. Open WebUI provides a polished chat interface. LiteLLM acts as a proxy that translates the OpenAI API into AWS Bedrock’s Converse API. Four models are available: Claude Sonnet 4, Claude Haiku 4.5, Amazon Nova Micro, and Amazon Nova Lite. Authentication is handled by the existing OAuth2 Proxy – no additional SSO configuration needed. The whole stack runs in three pods consuming under 500MB of RAM, and the only ongoing cost is per-request Bedrock pricing. No API keys from OpenAI or Anthropic required. ...

March 4, 2026 · 8 min · zolty
AI coding governance framework for engineering teams

Governing AI Coding Tools Across an Engineering Team

TL;DR AI coding tools are now default behavior for most developers, not an experiment. If you manage a team and you haven’t formalized this, you have ungoverned spend, security exposure, and inconsistent behavior happening right now. The fix isn’t to take the tools away — it’s to pick one, pay for it centrally, encode your policies into the AI itself using instruction files and skills, and govern the control folder rather than individual usage. Here’s the framework I’d implement. ...

March 3, 2026 · 10 min · zolty
AI failure patterns and guardrails

When the AI Breaks Production: Failure Patterns, Guardrails, and Measuring What Works

TL;DR AI tools have caused multiple production incidents in this cluster. The AI alert responder agent alone generated 14 documented failure patterns before it became reliable. A security scanner deployed by AI applied restricted PodSecurity labels to every namespace, silently blocking pod creation for half the applications in the cluster. The service selector trap – where AI routes 50% of requests to PostgreSQL instead of the application – appeared in 4 separate incidents before guardrails stopped it. This post catalogs the failure patterns, the five-layer guardrail architecture built to prevent them, and an honest assessment of what still goes wrong. ...

March 2, 2026 · 14 min · zolty
Two AIs managing a GitHub repository via issues and pull requests

Two AIs, One Codebase: Using Local Copilot to Direct GitHub Copilot via Issues and PRs

TL;DR A 109-day project plan. One day of actual work. Eight hours of active pipeline time. The key was treating planning and implementation as two separate AI-driven phases: spend an evening getting the plan right by routing it through multiple models, then let Claude Sonnet 4.6 implement it autonomously overnight via GitHub Copilot’s cloud agent while you sleep. This is the full playbook — planning phase included. The Project This came out of building dnd-multi, a full-stack AI Dungeon Master platform: FastAPI backend, Next.js 15 frontend, a Discord bot, LiveKit voice, and AWS Bedrock integration. Seven feature phases, a plan projected to take until June 19. ...

March 2, 2026 · 11 min · zolty
AI Dungeon Master platform architecture diagram

Building an AI Dungeon Master: Full-Stack D&D Platform on k3s

TL;DR I’m building a multiplayer D&D platform where an AI powered by AWS Bedrock Claude runs the game. Players connect via a Next.js web app or Discord. A 5-tier lore context system gives the AI persistent memory across sessions. A background world simulation engine tracks NPC positions, inventory, faction standings, and in-game time so the AI can focus on storytelling instead of bookkeeping. The foundation is fully deployed on my home k3s cluster. The current work is turning a working tech demo into a game people actually want to sit down and play. ...

March 2, 2026 · 14 min · zolty

Reference: DnD Multi — Project Plan (v1.0)

Context: This is the real project plan for dnd-multi, a full-stack AI Dungeon Master platform. It was generated by Claude Opus 4.6 during Phase 0 of the LLM GitHub PR workflow — synthesizing gap analysis from four different models into a structured execution document. Claude Sonnet 4.6 then used this plan overnight to open 24 PRs and ship all seven phases. Personal identifiers have been removed. Technical content is verbatim. Milestone Timeline Milestone Target Date Deliverable M0 — Platform Stable 2026-03-13 All broken deps fixed, migrations applied, Tier 2 lore generating, smoke tests passing M1 — First Playable Session 2026-04-03 Turn structure live, player identity in DM prompt, hot phrase + /dm command working M2 — Full Action Flow 2026-04-24 Action confirmation, non-active player queue, player votes operational M3 — IC/OOC + Personality 2026-05-08 Meta-mode detection, in-character assumption, DM personality tuning deployed M4 — Content & Reporting 2026-05-22 Book/media content generation live, /report + /flag system in admin dashboard M5 — Combat Tracker 2026-06-12 Live HP tracker UI, [COMBAT:] directives wired to death protection M6 — Feature Complete v1.0 2026-06-19 Spell reference Discord command, shareable campaign invitation links Current State Summary The platform has a solid full-stack foundation with all core systems implemented and deployed to the home k3s cluster. The gap is game experience polish — the AI DM has no awareness of whose turn it is, doesn’t distinguish in-character from out-of-character speech, lacks an action confirmation flow, and has no mechanism for players to report misbehavior. These are the features that make the difference between a tech demo and a playable game. ...

March 2, 2026 · 18 min · zolty

Reference: k3s Homelab — AI Lessons Learned

Context: This is the live docs/ai-lessons.md from the home k3s cluster repository, referenced extensively across posts on this blog — starting with AI Memory System and GitHub Copilot Setup Guide. Every entry exists because its absence caused a production incident. Personal identifiers and internal domains have been replaced with generic placeholders. Updated: 2026-03-03 Rules discovered through production breakage. Each entry prevents recurrence of a specific failure. Update this file whenever a new non-obvious failure pattern is discovered. ...

March 2, 2026 · 81 min · zolty
A terminal prompt with tool version numbers lined up neatly

Environment manifests for AI assistants across every repo

TL;DR I added a standardized environment.instructions.md file to every repository in my workspace. It’s a simple Markdown table of tool versions plus a few workflow snippets. AI assistants pick it up automatically, and they’ve stopped suggesting commands for tools or versions I don’t have. The whole thing took less than an hour to write out and propagate. The problem I run GitHub Copilot heavily across several projects — homelab infrastructure, a blog, a few Python services, and some supporting tooling. The AI context setup (those copilot-instructions.md files I wrote about here) covers what each project does and what its conventions are. What it doesn’t cover is what I’m actually running when I type commands in a terminal. ...

March 1, 2026 · 8 min · zolty
GitHub Copilot setup guide with AI skills and memory

Getting Started with GitHub Copilot: What Actually Works

TL;DR A $20/month GitHub Copilot subscription gives you Claude Sonnet 4.6, GPT-4o, and Gemini inside VS Code. Out of the box it’s useful. With a proper instruction setup — a copilot-instructions.md file, path-scoped rules, and skill documents — it becomes something you actually rely on. Most of the posts on this blog were built with this toolchain, mostly in the context of my k3s cluster, but the patterns apply anywhere. This is how I have it set up. ...

March 1, 2026 · 12 min · zolty
AI memory system architecture

Building an AI Memory System: From Blank Slate to 482 Lines of Hard-Won Knowledge

TL;DR The .github/copilot-instructions.md file started as 10 lines of project description and grew into a 99-line “operating system” for AI assistants. Then it split: failure patterns moved into docs/ai-lessons.md (now 482 lines across 20+ categories), and file-type-specific rules moved into .github/instructions/ with applyTo glob patterns. The same template structure was standardized across 5 repositories. This post traces the three generations of AI instruction architecture and shows how every production incident permanently improves AI reliability. ...

February 26, 2026 · 11 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.