Multi-Model Planning: The Same Pattern That Shipped dnd-multi

TL;DR

The Jellyfin HA conversion touches a .NET 10 codebase, Entity Framework Core migrations, Kubernetes manifests, Terraform infrastructure, PostgreSQL operations, and FFmpeg transcoding pipelines. No single AI model understands all of this equally well. So I used four of them — the same multi-model planning pattern that shipped dnd-multi in a single day and that I documented in the LLM GitHub PR workflow.

This post covers how I adapted that pattern for infrastructure work, what each model caught, and why planning is where all the human time should go.

The Pattern: Planning vs. Implementation

The core insight from the dnd-multi project was this: planning and implementation are different cognitive tasks, and different models are better at each.

Planning requires broad context, cross-domain reasoning, and catching what’s missing. Implementation requires precise, file-level code generation within a narrow scope.

The workflow splits cleanly:

Phase	Tool	Purpose
Outline	Human	Write the vision, feature list, phase structure
Gap Analysis 1	ChatGPT (o3)	Surface missing preconditions and dependencies
Gap Analysis 2	Claude Sonnet 4.6, Gemini Pro, GPT Codex	Independent review — each finds different gaps
Plan Finalization	Claude Opus 4.6	Synthesize all findings into an executable plan
Implementation	GitHub Copilot agent	Execute issues autonomously
Review & Merge	Claude Sonnet 4.6 (VS Code)	Inspect diffs, fix mistakes, commit, merge

This is the same pipeline documented in the Two AIs, One Codebase post, but applied to infrastructure instead of a Python game server.

Step 1: The Human Outline

I wrote the initial project outline myself. This is the one step a human must do — the AI models can’t read your mind about which tradeoffs you’re willing to make.

My outline covered:

Why: can’t tolerate downtime during node maintenance
Current architecture: single pod, SQLite, emptyDir, RWO PVC
Target: 2 replicas, PostgreSQL, sticky sessions, automated failover
Phases: roughly “database, storage, state, scaling”
Non-goals: true multi-master, SyncPlay HA, GPU load balancing

The outline was ~2 pages. Specific enough to be actionable, vague enough to leave room for the models to suggest better approaches.

Step 2: ChatGPT Gap Analysis

I uploaded the outline plus the Jellyfin codebase structure and asked ChatGPT (o3):

Evaluate this plan against the codebase. What’s missing, what’s wrong, what’s underspecified? List specific gaps with the exact files and classes affected.

ChatGPT’s strengths here were structural. It caught:

Missing EF Core migration strategy — the outline said “add PostgreSQL support” without specifying whether to use a new migration set or convert existing SQLite migrations. ChatGPT flagged that Jellyfin uses Microsoft.Data.Sqlite directly in places, not just EF Core, and those raw queries would break on PostgreSQL.
Config directory write conflicts — I’d planned NFS shared config but hadn’t identified which files get written at runtime. ChatGPT found that system.xml, logging.default.json, and several plugin directories get written to during startup and runtime.
Dependency chain gap — my phases were ordered correctly but I hadn’t specified explicit dependencies between individual issues within a phase. “Build Dockerfile” can’t start until “create database provider” is merged.

Step 3: Multi-Model Review Panel

I routed the ChatGPT-refined plan through three additional models with the same prompt. Each model found different things:

Claude Sonnet 4.6 (Copilot in VS Code)

Claude caught implementation-level issues that the other models missed:

Npgsql package version conflicts: Jellyfin uses central package management (Directory.Packages.props). Adding Npgsql.EntityFrameworkCore.PostgreSQL requires adding the version only there, never in individual .csproj files. Claude knew this from the project structure.
SkiaSharp pinning: the plan included adding NuGet packages but didn’t mention that SkiaSharp is pinned to [3.116.1] (bracket syntax = exact version). Any dependency chain that pulls a different SkiaSharp version would break the build.
IJellyfinDatabaseProvider interface design: Claude suggested the interface should expose a ConnectionString property and migration methods, catching that the provider pattern needs to work at both startup (migration) and runtime (query).

Gemini Pro

Gemini caught CI/CD and infrastructure gaps:

Workflow path filters: existing CI only triggers on specific file paths. A new ha-build.yml workflow needs to match the correct paths, or PRs will merge without any CI validation.
Docker buildx attestation: Gemini caught that without --provenance=false, Docker Desktop adds attestation manifests that k3s containerd can’t resolve — a problem I’d documented in the cluster instructions but hadn’t connected to this project.
ECR repository missing: the plan assumed an ECR repository exists for the HA image. It didn’t. Terraform needs to create it first.

GPT Codex

Codex caught operational and runtime issues:

PostgreSQL connection pooling: Jellyfin opens many short-lived database connections. Without connection pooling (MaxPoolSize in the connection string, or PgBouncer as a sidecar), the connection count would grow unbounded.
Health check configuration: the plan didn’t specify how Kubernetes would probe the Jellyfin pods. Codex noted that Jellyfin exposes /health and recommends using it with appropriate initialDelaySeconds since Jellyfin can take 30-60 seconds to start scanning libraries.
StatefulSet vs Deployment: Codex flagged that per-pod stable storage (for transcode directories) requires a StatefulSet with volumeClaimTemplates, not a Deployment with shared PVCs.

Step 4: Claude Opus 4.6 Synthesis

With four models’ worth of gap analysis, I handed everything to Claude Opus 4.6 for final plan generation:

Given the project outline and all gap analysis, produce a final execution plan with:
- Phase-by-phase feature breakdown
- Acceptance criteria per issue (concrete, testable, with exact file paths)
- Explicit dependency chain (what must be merged before each is created)
- Risk register with probability, impact, and mitigations
- Phase exit gates (pass/fail criteria)

Opus produced a ~35-page execution document covering:

4 phases, 24 individual issues
8 architecture decisions with rationale (Decision 1: PostgreSQL over MySQL/CockroachDB. Decision 5: sticky sessions over Redis sessions. Decision 8: 2 replicas not 3.)
14 risks cataloged with probability/impact/mitigation (R1: SQLite→PG migration data loss. R7: SyncPlay cross-pod failure. R14: EF Core migration naming conflict with upstream.)
Exit gates per phase: Phase 1 passes when dotnet build succeeds with PostgreSQL provider and Testcontainers integration tests pass. Phase 4 passes when failover test kills active pod and surviving pod serves traffic within 60 seconds.

This document became the master plan.

Why This Works Better Than Single-Model Planning

The overlap between models’ findings was surprisingly small. Here’s a rough Venn diagram of what each model caught:

Finding Category	ChatGPT	Claude	Gemini	Codex
EF Core migration strategy	✅
Config write conflicts	✅
NuGet version management		✅
Interface design		✅
CI path filters			✅
Docker attestation			✅
ECR Terraform			✅
Connection pooling				✅
Health check config				✅
StatefulSet requirement				✅

That’s 10 critical gaps, and no single model found more than 3. If I’d used only ChatGPT, I would have deployed without a StatefulSet, without health checks, and with broken CI. If I’d used only Codex, I would have missed the NuGet version conflict and the Docker provenance bug.

The multi-model approach costs ~$5 in API calls for the planning phase. The bugs it catches would each cost hours of debugging in production.

Adapting the Pattern for Infrastructure

The dnd-multi project was a single Python codebase. The Jellyfin HA project spans:

A .NET 10 fork (C#, EF Core, ASP.NET)
Kubernetes manifests (YAML)
Terraform (HCL)
Ansible playbooks (YAML)
Docker (Dockerfile, buildx, ECR)
Shell scripts (migration tooling)

Two adaptations were needed:

1. Issue Scope: Narrower Than Software Issues

Infrastructure issues touch the live cluster. A bad Kubernetes manifest means pods crash in production. I scoped issues much more narrowly than for dnd-multi:

dnd-multi: “Implement the dice rolling engine” (one issue, ~300 lines)
Jellyfin HA: “Create PostgreSQL 16-alpine StatefulSet with Longhorn PVC, resource limits, health check” (one issue, ~80 lines of YAML)

Smaller blast radius per issue = faster recovery when Copilot generates something wrong.

2. Linear Execution: No Parallel Issues

For dnd-multi, I could have multiple Copilot issues open in parallel because they touched different files. For Jellyfin HA, almost every change depends on the previous one:

Issue dependency chain — each issue depends on the previous one being merged before work can begin

Two Copilot issues touching the same file can race — the second one may revert changes from the first. For infrastructure, I worked strictly linearly: one issue at a time, merge, pull, next issue.

The Implementation Loop

With the plan finalized, implementation followed the exact pattern from Two AIs, One Codebase:

Create a surgical GitHub Issue assigned to @copilot
Copilot agent opens a draft PR on copilot/<branch>
Claude Sonnet 4.6 (in VS Code) reviews the diff
Fix what Copilot got wrong (typically 10-20% of the diff)
Push to a feature branch, open a clean PR
Wait for CI (if it’s a backend change)
Merge, pull main, start the next issue

The next five posts in this series walk through each phase: what the issues looked like, what Copilot generated, what needed fixing, and what the end result was.

Coming Up Next

Tomorrow: forking Jellyfin and building a PostgreSQL database provider in .NET 10 — the deepest code change in the project, and where Copilot’s diff needed the most correction.

No homelab? You can run this entire stack on a managed Kubernetes cluster. DigitalOcean’s managed K8s starts at a single node with $200 in free credits.

TL;DR#

The Pattern: Planning vs. Implementation#

Step 1: The Human Outline#

Step 2: ChatGPT Gap Analysis#

Step 3: Multi-Model Review Panel#

Claude Sonnet 4.6 (Copilot in VS Code)#

Gemini Pro#

GPT Codex#

Step 4: Claude Opus 4.6 Synthesis#

Why This Works Better Than Single-Model Planning#

Adapting the Pattern for Infrastructure#

1. Issue Scope: Narrower Than Software Issues#

2. Linear Execution: No Parallel Issues#

The Implementation Loop#

Coming Up Next#