TL;DR
Phase 2 is the scariest phase. It’s where we take a running Jellyfin instance with years of playback history, user preferences, and media metadata — then swap the database from SQLite to PostgreSQL and restructure every volume. One wrong move and the family discovers their “Continue Watching” list is gone.
This post covers deploying PostgreSQL as a k3s StatefulSet, restructuring Jellyfin’s volume layout from a monolithic RWO PVC to NFS shared config + Longhorn per-pod storage, and building a SQLite-to-PostgreSQL migration tool.
The Volume Problem
Before HA, Jellyfin had three mounts:
| Mount | Type | Mode | Contents |
|---|---|---|---|
/config | Longhorn PVC | RWO | SQLite databases, XML config, plugins, metadata images |
/media | NFS | RO | Media files (movies, TV, music) |
/cache | emptyDir | RW | Transcoding segments, image cache |
The /config PVC is ReadWriteOnce — only one pod can mount it. For two replicas, we need to split it.
The New Volume Layout
| Mount | Type | Mode | Contents |
|---|---|---|---|
/config | NFS | RWX | XML config, web config, plugins (shared between pods) |
/media | NFS | RO | Media files (unchanged) |
/data/transcode | Longhorn PVC (per-pod) | RWO | FFmpeg transcode segments |
/data/cache | Longhorn PVC (per-pod) | RWO | Image cache, metadata cache |
| PostgreSQL | StatefulSet | N/A | All database content (users, items, playback state) |
Key changes:
- Config moves to NFS — both pods read the same
system.xml, same plugin list - Database moves to PostgreSQL — no more SQLite files in
/config - Cache and transcode get per-pod PVCs — each pod writes its own transcode segments via StatefulSet
volumeClaimTemplates, no conflicts
Deploying PostgreSQL
PostgreSQL runs as a single-replica StatefulSet in the jellyfin namespace. Not a separate namespace — keeping it co-located simplifies NetworkPolicy and DNS resolution.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: jellyfin-postgres
namespace: jellyfin
labels:
app.kubernetes.io/name: jellyfin
app.kubernetes.io/component: database
spec:
serviceName: jellyfin-postgres
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: jellyfin
app.kubernetes.io/component: database
template:
metadata:
labels:
app.kubernetes.io/name: jellyfin
app.kubernetes.io/component: database
spec:
containers:
- name: postgres
image: postgres:16-alpine
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: jellyfin-postgres-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: jellyfin-postgres-credentials
key: password
- name: POSTGRES_DB
value: jellyfin
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
exec:
command: ["pg_isready", "-U", "jellyfin"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["pg_isready", "-U", "jellyfin"]
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 10Gi
The Service Selector Trap
This is the #1 recurring bug in the cluster, documented in the cluster’s AI lessons:
When PostgreSQL shares a namespace with the app, the Service selector MUST include
app.kubernetes.io/component. Without it, ~50% of requests route to the postgres pod, causing 502 errors.
The Jellyfin Service must select component: web, and the PostgreSQL Service must select component: database:
# Jellyfin web Service
apiVersion: v1
kind: Service
metadata:
name: jellyfin
spec:
selector:
app.kubernetes.io/name: jellyfin
app.kubernetes.io/component: web # <- CRITICAL
ports:
- port: 8096
# PostgreSQL Service
apiVersion: v1
kind: Service
metadata:
name: jellyfin-postgres
spec:
selector:
app.kubernetes.io/name: jellyfin
app.kubernetes.io/component: database # <- CRITICAL
ports:
- port: 5432
Without the component label, the Kubernetes Service matches all pods with app.kubernetes.io/name: jellyfin — which includes both the app and the database. Traefik load-balances across both, and half your HTTP requests hit a PostgreSQL socket that responds with incomprehensible binary.
The NFS Shared Config
The Ugreen DXP4800 NAS serves NFS shares for media over a 5Gbps LACP bond — dual 2.5GbE ports aggregated through a Ubiquiti USW Aggregation 10GbE switch. The Proxmox hosts connect to the same switch via Mellanox ConnectX-3 NICs at full 10GbE, so NFS reads from the NAS to any k3s node saturate the NAS uplink, not the node. I added a new share for Jellyfin config:
/volume1/k3s-jellyfin/config → NFS RWX mount at /config on both pods
/volume1/media → NFS RO mount at /media (existing)
Why NFS for config instead of another option:
| Option | Problem |
|---|---|
| Longhorn RWX | Longhorn’s RWX support requires an additional NFS provisioner sidecar. Adds complexity for a simple file share. |
| CephFS | Not deployed in this cluster. Would require a completely new storage layer. |
| ConfigMap | Config files change at runtime. ConfigMaps are immutable once mounted. |
| NFS | Already running on the NAS. Zero additional infrastructure. 5Gbps aggregate throughput. |
What Gets Written to Config at Runtime
Not everything in /config is read-only. I audited which files Jellyfin modifies during operation:
| File | When Modified | HA Impact |
|---|---|---|
system.xml | Admin changes settings via UI | Both pods read same config — works |
logging.default.json | Log level changed via API | Both pods read same config — works |
data/plugins/ | Plugin installed via UI | Install on one pod, restart both to load |
data/collections/ | User creates a collection | NFS handles concurrent writes |
metadata/ | Library scan pulls images | Non-exclusive writes — NFS handles this |
The risk: two pods writing to the same XML file simultaneously. In practice, admin config changes happen through the UI (which sticky sessions route to one pod), and library scans use file-level locking that NFS supports.
The SQLite-to-PostgreSQL Migration Tool
This is the most delicate part of the entire project. Over a year of playback history, user preferences, watched states, and media metadata lives in SQLite. Losing any of it means angry family members.
The migration tool is a standalone .NET console application:
dotnet run --project tools/JellyfinMigrator -- \
--source "/config/data/jellyfin.db" \
--target "Host=jellyfin-postgres;Database=jellyfin;Username=jellyfin;Password=..." \
--batch-size 1000 \
--dry-run
Migration Strategy
- Stop Jellyfin — scale the StatefulSet to 0 replicas
- Backup SQLite — copy all
.dbfiles to a safe location - Run migrations — create the PostgreSQL schema via EF Core migrations
- Run the migrator — read from SQLite, batch-insert into PostgreSQL
- Verify counts — compare row counts across all 29 tables
- Start Jellyfin — scale back to 1 replica pointing at PostgreSQL
- Validate — check that playback positions, favorites, and user data are intact
The Batch Insert Pattern
SQLite tables can have millions of rows (a large library with thousands of items generates millions of ItemValues and MediaStreams rows). The migrator reads in batches of 1,000 and uses COPY (PostgreSQL’s bulk insert) for performance:
await using var writer = await connection.BeginBinaryImportAsync(
"COPY \"BaseItems\" (\"Id\", \"Type\", \"Name\", ...) FROM STDIN (FORMAT BINARY)");
foreach (var item in batch)
{
await writer.StartRowAsync();
await writer.WriteAsync(item.Id, NpgsqlDbType.Uuid);
await writer.WriteAsync(item.Type, NpgsqlDbType.Text);
await writer.WriteAsync(item.Name, NpgsqlDbType.Text);
// ... 30+ columns
}
await writer.CompleteAsync();
COPY is 10-100x faster than individual INSERT statements for bulk operations. The entire migration (a moderate-sized library with ~5,000 items, ~50,000 item values, and ~100,000 media streams) completes in under 60 seconds.
Data Type Conversions
The migrator handles several SQLite-to-PostgreSQL type mappings:
| SQLite Type | PostgreSQL Type | Conversion |
|---|---|---|
TEXT GUID | UUID | Guid.Parse() |
INTEGER boolean | BOOLEAN | value != 0 |
TEXT ISO 8601 | TIMESTAMPTZ | DateTimeOffset.Parse() with UTC |
REAL double | DOUBLE PRECISION | Direct |
BLOB | BYTEA | Direct |
TEXT JSON | JSONB | Direct (preserves structure) |
The most common failure mode in testing: SQLite stores NULL timestamps as empty strings, not actual NULL values. The migrator explicitly handles this: string.IsNullOrEmpty(value) ? null : DateTimeOffset.Parse(value).
The StatefulSet Conversion
With PostgreSQL running and data migrated, the Jellyfin Deployment becomes a StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: jellyfin
namespace: jellyfin
spec:
serviceName: jellyfin
replicas: 1 # Start with 1, scale to 2 in Phase 4
selector:
matchLabels:
app.kubernetes.io/name: jellyfin
app.kubernetes.io/component: web
template:
spec:
containers:
- name: jellyfin
image: 855878721457.dkr.ecr.us-east-1.amazonaws.com/k3s-homelab/jellyfin-ha:latest
env:
- name: JELLYFIN_DatabaseProvider
value: "Jellyfin-PostgreSQL"
- name: JELLYFIN_ConnectionStrings__Jellyfin-PostgreSQL
valueFrom:
secretKeyRef:
name: jellyfin-postgres-credentials
key: connection-string
volumeMounts:
- name: config
mountPath: /config
- name: media
mountPath: /media
readOnly: true
- name: transcode
mountPath: /data/transcode
- name: cache
mountPath: /data/cache
volumes:
- name: config
nfs:
server: 192.168.1.100
path: /volume1/k3s-jellyfin/config
- name: media
nfs:
server: 192.168.1.100
path: /volume1/media
volumeClaimTemplates:
- metadata:
name: transcode
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 50Gi
- metadata:
name: cache
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 10Gi
Key differences from a Deployment:
volumeClaimTemplatescreates unique PVCs per pod:transcode-jellyfin-0,transcode-jellyfin-1serviceNamecreates stable DNS:jellyfin-0.jellyfin.jellyfin.svc.cluster.local- Pod identity is stable across rescheduling —
jellyfin-0keeps its PVCs even when moved to a different node
S3 Backup CronJob
PostgreSQL data is too important for Longhorn replication alone. A CronJob dumps the database to S3 nightly:
apiVersion: batch/v1
kind: CronJob
metadata:
name: jellyfin-postgres-backup
namespace: jellyfin
spec:
schedule: "0 3 * * *" # 3 AM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:16-alpine
command:
- /bin/sh
- -c
- |
pg_dump -h jellyfin-postgres -U jellyfin jellyfin | \
gzip > /tmp/jellyfin-$(date +%Y%m%d).sql.gz && \
aws s3 cp /tmp/jellyfin-*.sql.gz \
s3://k3s-homelab-backups/jellyfin/
restartPolicy: OnFailure
The backup runs at 3 AM, dumps the entire database, compresses it, and uploads to S3. Retention is managed by S3 lifecycle rules: 30 days of daily backups, then monthly snapshots for 1 year.
Coming Up Next
Tomorrow: state externalization and the sticky session compromise — what we did with those 11 ConcurrentDictionary caches and why we chose pragmatism over perfection.
Browse the code: The full Jellyfin fork — including the PostgreSQL provider, Dockerfile, and CI pipeline — is public at github.com/zolty-mat/jellyfin. The Kubernetes manifests will follow once the infrastructure repo is cleaned up.
Cloud alternative: Instead of running PostgreSQL yourself, DigitalOcean Managed Databases handles backups, failover, and upgrades automatically.