Building an AI Dungeon Master: Full-Stack D&D Platform on k3s

TL;DR

I’m building a multiplayer D&D platform where an AI powered by AWS Bedrock Claude runs the game. Players connect via a Next.js web app or Discord. A 5-tier lore context system gives the AI persistent memory across sessions. A background world simulation engine tracks NPC positions, inventory, faction standings, and in-game time so the AI can focus on storytelling instead of bookkeeping. The foundation is fully deployed on my home k3s cluster. The current work is turning a working tech demo into a game people actually want to sit down and play.

Why This Exists

My friend group has been trying to run a D&D campaign for years. The scheduling problem is real — five people with jobs, families, and wildly different time zones can rarely align on a three-hour session. We’ve canceled more sessions than we’ve played.

The obvious workaround is asynchronous play. Slow D&D over Discord, where players post actions and the DM responds when they can. The problem is that a human DM still has to be available, still has to track state, still has to remember what happened three weeks ago when the paladin made a suspicious deal with a toll keeper that everyone else forgot about.

The AI DM solves the human availability problem. It responds immediately. It has perfect recall. It never forgets the suspicious toll keeper deal. And on nights when a few players happen to be online at the same time, it runs live sessions with voice, dice, and real-time narration.

I also wanted an excuse to build something non-trivial on top of AWS Bedrock. Deploying hobby apps is a different engineering challenge than writing infrastructure playbooks. The homelab has proven that it can run production workloads reliably. Let’s see what happens when you point language models at it.

Architecture

The stack has four main services, all running in the dnd namespace on the k3s cluster:

Next.js 15 (App Router + React Query + WebSocket + LiveKit SDK)
    ↓ HTTPS / WSS
FastAPI (async, SQLAlchemy 2.x, pgvector)
    ├── AI DM Engine (Bedrock Claude → DMResponse + directives)
    ├── 5-Tier Lore System (persistent campaign memory)
    ├── World Simulation Engine (background game state tick loop)
    ├── Character Death Protection (dice bias + reroll generation)
    ├── WebSocket Hub (per-campaign rooms, in-process state)
    └── LiveKit voice token generation
         ↓
PostgreSQL 16 + pgvector     Redis (sessions only)
AWS Bedrock (Claude + Titan)     Amazon Polly (TTS)
Amazon Transcribe (STT)          LiveKit (WebRTC voice)
Discord Bot (slash commands + VC audio)

Nothing exotic at the infrastructure level — Traefik ingress, cert-manager TLS, Longhorn PVCs for Postgres, Prometheus metrics on every service. The interesting engineering lives in the AI plumbing.

One deliberate architectural constraint worth naming: the WebSocket hub runs in-process, not in Redis. A single replica Deployment with Recreate strategy keeps all campaign room state in memory. This means zero latency for WS message fan-out and zero infrastructure complexity for routing. The tradeoff is that I can never scale past one backend pod without a Redis pub/sub migration. For a homelab game with a handful of concurrent users, that tradeoff is correct. I even documented it explicitly so I don’t accidentally scale the Deployment to 2 and spend hours debugging state split.

The AI DM Engine

Every player action flows through the same pipeline:

Assemble world state digest — the World Engine generates a compact (<500 token) summary of current NPC positions, weather, active factions, in-game time
Build lore context — the 5-tier system assembles campaign history, session summaries, recent events, and character memories
Select the model — Haiku for routine turns (fast, cheap); Sonnet for cinematic moments (richer narration)
Invoke Claude — with exponential backoff on throttling, structured system prompt, and the full lore context
Parse directives — Claude embeds structured commands in plain text that the engine extracts and executes
Execute world commands — directive side effects (NPC moves, item grants, weather changes) are sent to the World Engine
Optionally generate — scene image via Titan, TTS narration via Polly
Return DMResponse — never raise; fallback narration on any Bedrock failure

The directive system is the part I’m most happy with. Instead of asking Claude to return structured JSON (which breaks narration flow), I let it write freely and embed commands inline:

The innkeeper eyes you suspiciously and slides a key across the bar.
"Room three. Don't touch anything."

[WORLD: move_npc innkeeper_miriam back_room]
[WORLD: give_item player:usr_abc healing_potion 1]
[SCENE: dimly lit tavern back room, flickering candlelight, suspicious innkeeper]
[TURN: usr_xyz]

The engine strips these before returning narrative to clients. Players see clean prose. The game state gets updated silently. The DM system prompt teaches Claude the directive syntax; after a few turns it uses them consistently without any structured output forcing.

The 5-Tier Lore System

This is the memory architecture that lets the AI DM “remember” everything that happened in a campaign.

Tier	Name	Max Tokens	Purpose
0	World Constants	5,000	Immutable lore: cosmology, races, deities, geography
1	Campaign Arc	3,000	Major story beats for this specific campaign
2	Session Summaries	1,000 each	Auto-generated summary of each past session
3	Recent Events	4,000	The last ~20 full messages (rolling window)
4	Character Memory	2,000	Per-character relationships, goals, secrets, affiliations

Assembly order matters. Tier 0 is always included — it never gets trimmed regardless of total context size. When total tokens exceed the model’s context limit, the system trims backward from Tier 2, dropping oldest session summaries first. Tier 3 and 4 are protected because they contain the immediate context the AI needs to make coherent decisions right now.

The world state digest from the World Engine is appended after Tier 4. It’s not a lore tier — it’s real-time game state: “the innkeeper is in the back room, it’s raining, the merchant guild is distrustful of the party, and it’s 10 PM in-game.”

Session summaries (Tier 2) are generated automatically when a session ends. Claude Haiku reads the full session transcript and produces a 200-word narrative summary that gets embedded into pgvector for lore search. Over a long campaign, the AI “remembers” what happened in session 1 because a compressed summary of it lives in Tier 2, even if the actual transcript would be 40,000 tokens.

The World Simulation Engine

This was the wildest architectural decision and the one that needed the most careful justification.

The problem: an AI DM needs to track a lot of bookkeeping state. Where is each NPC? What items does the party have? What time of day is it? What’s the weather? What does the merchant guild think of the paladin after that toll keeper incident?

One approach is to put all of this in the AI context. Let Claude track state via memory. The problem is that state drift compounds exponentially with session count. After twenty sessions of “remember that X is true,” the context fills up and the AI hallucinates contradictions.

The other approach is an explicit simulation engine that maintains ground truth, and feeds the AI a compact digest instead of paragraphs of state.

I built the second approach, inspired heavily by Dwarf Fortress’s simulation model. The World Engine is a per-campaign async tick loop:

1 real-second = 1 game-minute by default. Controllable via directives or API.
NPC schedules — each NPC has a schedule object ({"hour": "location_id"}). The engine moves them autonomously throughout the day.
Faction standings — numeric affinity scores updated by [WORLD: advance_faction] directives
Weather — Markov chain transitions unless overridden by Claude
Shop inventory — restocks daily on the game clock

Claude issues world commands via directives in response text. The engine executes them, updates the DB models, and the next turn’s digest reflects the updated state. The AI never has to “remember” the state — it reads a fresh digest every turn.

The digest costs under 500 tokens. That’s the budget I allocated from an original 800-token estimate when I realized weather description and full faction details were eating unnecessary space. Current format:

World state: Day 14, 22:00 game-time. Weather: light_rain.
NPCs: innkeeper_miriam(back_room), guard_captain(barracks/hostile), mayor(manor).
Factions: merchant_guild(-15/distrustful), thieves_guild(+40/allied).
Active events: bandit_patrol(road, expires 06:00).

Seven lines. The AI processes it instantly and produces accurate world-aware narration without me needing to inject paragraphs of “remember that it’s raining and the innkeeper is suspicious.”

Phase 0: The Tech Debt Reckoning

Before writing a single feature, I spent two weeks fixing broken things. The codebase had reached a state where the foundation worked on paper, but specific failure modes were lurking that would destroy player experience the moment anyone actually tried to run a game.

The worst ones:

--workers 2 in the backend Dockerfile — The FastAPI app uses in-process WebSocket state. Two Gunicorn workers mean two separate ConnectionManager instances. Half of all WebSocket messages get routed to the wrong worker. Players would intermittently stop seeing messages from other players. The fix is one line (--workers 1), but it took tracing a production WS state bug to figure out why.

asyncio.get_event_loop() in async contexts — Python 3.10+ deprecates get_event_loop() in running async contexts and started raising DeprecationWarning. In 3.12 it raises outright. All boto3 Bedrock calls were using this pattern. Six files, same fix each time: replace with asyncio.get_event_loop() → asyncio.get_running_loop().

create_all() instead of Alembic — The database init was calling SQLAlchemy’s create_all() on startup, bypassing Alembic migration history entirely. This works until someone deploys a schema change and the new column doesn’t exist because Alembic never ran. Removed create_all(), made startup run alembic upgrade head instead.

generate_tts=True hardcoded in the WebSocket handler — Every single DM response was generating Amazon Polly audio, regardless of context. Polly bills per character synthesized. An active session with multiple turns per minute would accumulate costs fast. This should be opt-in per message, not hardcoded on.

Zero test coverage — Not a single test across backend, frontend, or Discord bot. This is the one I haven’t fixed yet. It’s the highest-consequence gap and the hardest to address retroactively. Phase 0 introduced smoke tests for the core API paths. Full coverage is still an open debt.

The discipline of explicitly writing down tech debt before touching features changed how I approached the project. It’s easy to convince yourself to skip the boring fixes and start building the cool stuff. The broken Dockerfile worker count would have produced a maddening production bug at the worst possible time — the first time real people tried to play.

Turn Structure: From Demo to Game

With the foundation stable, Phase 1 addressed the most fundamental game experience gap: the AI DM had no awareness of whose turn it was.

Every player’s message got routed to the DM indiscriminately. Two players could both take actions in the same “moment.” The AI would try to narrate two simultaneous actions and produce incoherent outcomes. This is fine for testing functionality — it’s unusable as a game.

The solution is active_player_id on the Session model. The DM sets this via [TURN: player_id] directive at the end of each response. The WebSocket hub enforces turn rules server-side: non-active player action messages are blocked immediately with clear feedback.

The frontend highlights whose turn it is. Non-active players can still ask questions (the DM responds without advancing the turn), see the active player’s actions narrated in real time, and queue their own planned actions for when their turn arrives.

The action queue feeds back into the DM context automatically. When a player’s turn starts, their queued action is injected into the DM prompt: “Player Elindra queued this action while waiting: I scan the room for exits.” The DM references it naturally in narration, which makes non-active players feel like they’re participating even when they’re waiting.

This cost 250 tokens of Tier 3 budget. Tracked explicitly in the plan. Lore trim behavior was re-verified after the injection to confirm nothing critical was getting cut.

Action Confirmation and Player Votes

Phase 2 built the action confirmation flow and non-active player rules.

Before executing an action, the DM summarizes what’s about to happen and waits for confirmation. Players on easy difficulty mode get additional consequence hints — “attacking the guard captain directly will likely alert the entire garrison.” On normal and hardcore, the DM describes the action and confirms intent without telegraphing consequences.

Confirmation shortcuts keep it from feeling like bureaucracy. Appending -y, -ac, or --auto-confirm to any action message skips the confirmation. Saying “confirm” via STT skips it verbally. Most experienced players configure auto-confirm after the first session.

The vote system is the part that surprised me by working so well. When the outcome of an ambiguous group decision doesn’t have plot consequences, the DM can call a vote instead of deciding unilaterally:

[VOTE: Which road do you take? | The coastal path | Through the forest | 30]

Players see a 30-second vote overlay with a countdown. The majority wins. The DM narrates the outcome and advances the story. It turns an awkward group deliberation (“what does everyone think?” — silence — “I guess I’ll just decide?”) into a 30-second structured moment. My players liked this more than I expected.

What’s Still in Progress

The platform is playable in its current state, but “playable” and “polished” are far apart.

IC/OOC separation (Phase 3) — The DM currently assumes all player speech is in-character. If you ask “what’s the weather going to do?” the DM narrates your character asking about the weather. There’s no clean way to step outside the fiction yet. Meta-mode detection (/meta, <OOC>) is next.

DM personality (Phase 3) — Campaigns can be configured as serious, balanced, or comedic. Balanced is the default. The system prompt injects appropriate tone guidance. Comedic mode produces significantly more puns than anyone asked for, which is exactly correct.

Book and media content generation (Phase 4) — When a character opens a book or reads a document, Claude Haiku generates contextually appropriate contents. A spell tome contains actual spells relevant to the world. An inn’s noticeboard has locally relevant jobs. Generated content caches to the WorldItem row so re-reading produces the same result.

Combat tracker UI (Phase 6) — Currently the DM narrates combat with HP tracking living entirely in Claude’s head. This means players have no reliable real-time HP visibility and the DM can occasionally “forget” that a character took significant damage four rounds ago. A [COMBAT:] directive system and a live tracker UI are the fix. The World Engine will own combat state; the AI gets a combat digest alongside the world digest.

Lessons So Far

The directive system is more reliable than structured output for prose + commands. Asking Claude to return JSON breaks its narration cadence. Letting it write freely and embed [COMMANDS] produces consistently good narration AND consistently parseable commands after a few turns of system prompt tuning.

In-process WS state was the right call. The Redis migration path is documented and ready if needed. But for the actual current use case (a handful of friends, single region, single cluster), the in-process approach has zero operational overhead and sub-millisecond fan-out. Document the constraint clearly and revisit when the constraint is actually hit.

Tech debt before features, always. The two weeks of fixes before Phase 1 felt like lost time. In practice, every fixed issue would have surfaced as a production bug during the first real session — at the worst possible moment.

Token budget tracking is a first-class concern. The lore system has hard token limits per tier. Every new context injection (turn queue, combat digest, action pending state) has to come from somewhere. I started maintaining explicit token budget notes in the project plan for each feature. This sounds like overkill until you watch a session where the AI starts hallucinating because its effective context window got silently trimmed.

Polly costs accumulate fast. TTS on every DM turn in an active session can be 1,000-5,000 characters of narration. At $4 per million characters, a 2-hour session with active TTS would cost several cents — not catastrophic, but multiplied across campaigns it adds up. TTS is now opt-in, defaults to off, and only triggers on explicitly cinematic moments.

What’s Next

The v1.0 target I’m working toward is M6: Feature Complete by 2026-06-19. That covers the full action flow already deployed, IC/OOC mode, personality tuning, content generation, the /report system for flagging AI misbehavior, the combat tracker UI, and a spell reference Discord command.

The actual long-term vision is a campaign that runs for months, accumulates a rich world history, and produces a playable narrative record of everything that happened. The 5-tier lore system is designed for this. By session 20, the AI DM will have 20 session summaries in Tier 2, 20 sessions of world engine history, and character memories that reference events from the beginning of the campaign. That’s the D&D experience I want to build toward.

I’ll write follow-up posts as each phase lands. The IC/OOC detection is next up.

Don’t have a homelab? The backend and bot can run on any cloud provider that supports Kubernetes. A $200-credit DigitalOcean account is enough to host the entire stack for several months.

TL;DR#

Why This Exists#

Architecture#

The AI DM Engine#

The 5-Tier Lore System#

The World Simulation Engine#

Phase 0: The Tech Debt Reckoning#

Turn Structure: From Demo to Game#

Action Confirmation and Player Votes#

What’s Still in Progress#

Lessons So Far#

What’s Next#