TL;DR
The Jellyfin HA conversion touches a .NET 10 codebase, Entity Framework Core migrations, Kubernetes manifests, Terraform infrastructure, PostgreSQL operations, and FFmpeg transcoding pipelines. No single AI model understands all of this equally well. So I used four of them — the same multi-model planning pattern that shipped dnd-multi in a single day and that I documented in the LLM GitHub PR workflow.
This post covers how I adapted that pattern for infrastructure work, what each model caught, and why planning is where all the human time should go.
The Pattern: Planning vs. Implementation
The core insight from the dnd-multi project was this: planning and implementation are different cognitive tasks, and different models are better at each.
Planning requires broad context, cross-domain reasoning, and catching what’s missing. Implementation requires precise, file-level code generation within a narrow scope.
The workflow splits cleanly:
| Phase | Tool | Purpose |
|---|---|---|
| Outline | Human | Write the vision, feature list, phase structure |
| Gap Analysis 1 | ChatGPT (o3) | Surface missing preconditions and dependencies |
| Gap Analysis 2 | Claude Sonnet 4.6, Gemini Pro, GPT Codex | Independent review — each finds different gaps |
| Plan Finalization | Claude Opus 4.6 | Synthesize all findings into an executable plan |
| Implementation | GitHub Copilot agent | Execute issues autonomously |
| Review & Merge | Claude Sonnet 4.6 (VS Code) | Inspect diffs, fix mistakes, commit, merge |
This is the same pipeline documented in the Two AIs, One Codebase post, but applied to infrastructure instead of a Python game server.
Step 1: The Human Outline
I wrote the initial project outline myself. This is the one step a human must do — the AI models can’t read your mind about which tradeoffs you’re willing to make.
My outline covered:
- Why: can’t tolerate downtime during node maintenance
- Current architecture: single pod, SQLite, emptyDir, RWO PVC
- Target: 2 replicas, PostgreSQL, sticky sessions, automated failover
- Phases: roughly “database, storage, state, scaling”
- Non-goals: true multi-master, SyncPlay HA, GPU load balancing
The outline was ~2 pages. Specific enough to be actionable, vague enough to leave room for the models to suggest better approaches.
Step 2: ChatGPT Gap Analysis
I uploaded the outline plus the Jellyfin codebase structure and asked ChatGPT (o3):
Evaluate this plan against the codebase. What’s missing, what’s wrong, what’s underspecified? List specific gaps with the exact files and classes affected.
ChatGPT’s strengths here were structural. It caught:
Missing EF Core migration strategy — the outline said “add PostgreSQL support” without specifying whether to use a new migration set or convert existing SQLite migrations. ChatGPT flagged that Jellyfin uses
Microsoft.Data.Sqlitedirectly in places, not just EF Core, and those raw queries would break on PostgreSQL.Config directory write conflicts — I’d planned NFS shared config but hadn’t identified which files get written at runtime. ChatGPT found that
system.xml,logging.default.json, and several plugin directories get written to during startup and runtime.Dependency chain gap — my phases were ordered correctly but I hadn’t specified explicit dependencies between individual issues within a phase. “Build Dockerfile” can’t start until “create database provider” is merged.
Step 3: Multi-Model Review Panel
I routed the ChatGPT-refined plan through three additional models with the same prompt. Each model found different things:
Claude Sonnet 4.6 (Copilot in VS Code)
Claude caught implementation-level issues that the other models missed:
Npgsql package version conflicts: Jellyfin uses central package management (
Directory.Packages.props). AddingNpgsql.EntityFrameworkCore.PostgreSQLrequires adding the version only there, never in individual.csprojfiles. Claude knew this from the project structure.SkiaSharp pinning: the plan included adding NuGet packages but didn’t mention that SkiaSharp is pinned to
[3.116.1](bracket syntax = exact version). Any dependency chain that pulls a different SkiaSharp version would break the build.IJellyfinDatabaseProviderinterface design: Claude suggested the interface should expose aConnectionStringproperty and migration methods, catching that the provider pattern needs to work at both startup (migration) and runtime (query).
Gemini Pro
Gemini caught CI/CD and infrastructure gaps:
Workflow path filters: existing CI only triggers on specific file paths. A new
ha-build.ymlworkflow needs to match the correct paths, or PRs will merge without any CI validation.Docker buildx attestation: Gemini caught that without
--provenance=false, Docker Desktop adds attestation manifests that k3s containerd can’t resolve — a problem I’d documented in the cluster instructions but hadn’t connected to this project.ECR repository missing: the plan assumed an ECR repository exists for the HA image. It didn’t. Terraform needs to create it first.
GPT Codex
Codex caught operational and runtime issues:
PostgreSQL connection pooling: Jellyfin opens many short-lived database connections. Without connection pooling (
MaxPoolSizein the connection string, or PgBouncer as a sidecar), the connection count would grow unbounded.Health check configuration: the plan didn’t specify how Kubernetes would probe the Jellyfin pods. Codex noted that Jellyfin exposes
/healthand recommends using it with appropriateinitialDelaySecondssince Jellyfin can take 30-60 seconds to start scanning libraries.StatefulSet vs Deployment: Codex flagged that per-pod stable storage (for transcode directories) requires a StatefulSet with
volumeClaimTemplates, not a Deployment with shared PVCs.
Step 4: Claude Opus 4.6 Synthesis
With four models’ worth of gap analysis, I handed everything to Claude Opus 4.6 for final plan generation:
Given the project outline and all gap analysis, produce a final execution plan with:
- Phase-by-phase feature breakdown
- Acceptance criteria per issue (concrete, testable, with exact file paths)
- Explicit dependency chain (what must be merged before each is created)
- Risk register with probability, impact, and mitigations
- Phase exit gates (pass/fail criteria)
Opus produced a ~35-page execution document covering:
- 4 phases, 24 individual issues
- 8 architecture decisions with rationale (Decision 1: PostgreSQL over MySQL/CockroachDB. Decision 5: sticky sessions over Redis sessions. Decision 8: 2 replicas not 3.)
- 14 risks cataloged with probability/impact/mitigation (R1: SQLite→PG migration data loss. R7: SyncPlay cross-pod failure. R14: EF Core migration naming conflict with upstream.)
- Exit gates per phase: Phase 1 passes when
dotnet buildsucceeds with PostgreSQL provider and Testcontainers integration tests pass. Phase 4 passes when failover test kills active pod and surviving pod serves traffic within 60 seconds.
This document became the master plan.
Why This Works Better Than Single-Model Planning
The overlap between models’ findings was surprisingly small. Here’s a rough Venn diagram of what each model caught:
| Finding Category | ChatGPT | Claude | Gemini | Codex |
|---|---|---|---|---|
| EF Core migration strategy | ✅ | |||
| Config write conflicts | ✅ | |||
| NuGet version management | ✅ | |||
| Interface design | ✅ | |||
| CI path filters | ✅ | |||
| Docker attestation | ✅ | |||
| ECR Terraform | ✅ | |||
| Connection pooling | ✅ | |||
| Health check config | ✅ | |||
| StatefulSet requirement | ✅ |
That’s 10 critical gaps, and no single model found more than 3. If I’d used only ChatGPT, I would have deployed without a StatefulSet, without health checks, and with broken CI. If I’d used only Codex, I would have missed the NuGet version conflict and the Docker provenance bug.
The multi-model approach costs ~$5 in API calls for the planning phase. The bugs it catches would each cost hours of debugging in production.
Adapting the Pattern for Infrastructure
The dnd-multi project was a single Python codebase. The Jellyfin HA project spans:
- A .NET 10 fork (C#, EF Core, ASP.NET)
- Kubernetes manifests (YAML)
- Terraform (HCL)
- Ansible playbooks (YAML)
- Docker (Dockerfile, buildx, ECR)
- Shell scripts (migration tooling)
Two adaptations were needed:
1. Issue Scope: Narrower Than Software Issues
Infrastructure issues touch the live cluster. A bad Kubernetes manifest means pods crash in production. I scoped issues much more narrowly than for dnd-multi:
- dnd-multi: “Implement the dice rolling engine” (one issue, ~300 lines)
- Jellyfin HA: “Create PostgreSQL 16-alpine StatefulSet with Longhorn PVC, resource limits, health check” (one issue, ~80 lines of YAML)
Smaller blast radius per issue = faster recovery when Copilot generates something wrong.
2. Linear Execution: No Parallel Issues
For dnd-multi, I could have multiple Copilot issues open in parallel because they touched different files. For Jellyfin HA, almost every change depends on the previous one:
Two Copilot issues touching the same file can race — the second one may revert changes from the first. For infrastructure, I worked strictly linearly: one issue at a time, merge, pull, next issue.
The Implementation Loop
With the plan finalized, implementation followed the exact pattern from Two AIs, One Codebase:
- Create a surgical GitHub Issue assigned to
@copilot - Copilot agent opens a draft PR on
copilot/<branch> - Claude Sonnet 4.6 (in VS Code) reviews the diff
- Fix what Copilot got wrong (typically 10-20% of the diff)
- Push to a feature branch, open a clean PR
- Wait for CI (if it’s a backend change)
- Merge, pull main, start the next issue
The next five posts in this series walk through each phase: what the issues looked like, what Copilot generated, what needed fixing, and what the end result was.
Coming Up Next
Tomorrow: forking Jellyfin and building a PostgreSQL database provider in .NET 10 — the deepest code change in the project, and where Copilot’s diff needed the most correction.
No homelab? You can run this entire stack on a managed Kubernetes cluster. DigitalOcean’s managed K8s starts at a single node with $200 in free credits.