TL;DR

Phase 2 is the scariest phase. It’s where we take a running Jellyfin instance with years of playback history, user preferences, and media metadata — then swap the database from SQLite to PostgreSQL and restructure every volume. One wrong move and the family discovers their “Continue Watching” list is gone.

This post covers deploying PostgreSQL as a k3s StatefulSet, restructuring Jellyfin’s volume layout from a monolithic RWO PVC to NFS shared config + Longhorn per-pod storage, and building a SQLite-to-PostgreSQL migration tool.


The Volume Problem

Before HA, Jellyfin had three mounts:

MountTypeModeContents
/configLonghorn PVCRWOSQLite databases, XML config, plugins, metadata images
/mediaNFSROMedia files (movies, TV, music)
/cacheemptyDirRWTranscoding segments, image cache

The /config PVC is ReadWriteOnce — only one pod can mount it. For two replicas, we need to split it.

The New Volume Layout

MountTypeModeContents
/configNFSRWXXML config, web config, plugins (shared between pods)
/mediaNFSROMedia files (unchanged)
/data/transcodeLonghorn PVC (per-pod)RWOFFmpeg transcode segments
/data/cacheLonghorn PVC (per-pod)RWOImage cache, metadata cache
PostgreSQLStatefulSetN/AAll database content (users, items, playback state)

Key changes:

  • Config moves to NFS — both pods read the same system.xml, same plugin list
  • Database moves to PostgreSQL — no more SQLite files in /config
  • Cache and transcode get per-pod PVCs — each pod writes its own transcode segments via StatefulSet volumeClaimTemplates, no conflicts

Deploying PostgreSQL

PostgreSQL runs as a single-replica StatefulSet in the jellyfin namespace. Not a separate namespace — keeping it co-located simplifies NetworkPolicy and DNS resolution.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: jellyfin-postgres
  namespace: jellyfin
  labels:
    app.kubernetes.io/name: jellyfin
    app.kubernetes.io/component: database
spec:
  serviceName: jellyfin-postgres
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: jellyfin
      app.kubernetes.io/component: database
  template:
    metadata:
      labels:
        app.kubernetes.io/name: jellyfin
        app.kubernetes.io/component: database
    spec:
      containers:
        - name: postgres
          image: postgres:16-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: jellyfin-postgres-credentials
                  key: username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: jellyfin-postgres-credentials
                  key: password
            - name: POSTGRES_DB
              value: jellyfin
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            exec:
              command: ["pg_isready", "-U", "jellyfin"]
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "jellyfin"]
            initialDelaySeconds: 5
            periodSeconds: 5
          volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 10Gi

The Service Selector Trap

This is the #1 recurring bug in the cluster, documented in the cluster’s AI lessons:

When PostgreSQL shares a namespace with the app, the Service selector MUST include app.kubernetes.io/component. Without it, ~50% of requests route to the postgres pod, causing 502 errors.

The Jellyfin Service must select component: web, and the PostgreSQL Service must select component: database:

# Jellyfin web Service
apiVersion: v1
kind: Service
metadata:
  name: jellyfin
spec:
  selector:
    app.kubernetes.io/name: jellyfin
    app.kubernetes.io/component: web    # <- CRITICAL
  ports:
    - port: 8096

# PostgreSQL Service
apiVersion: v1
kind: Service
metadata:
  name: jellyfin-postgres
spec:
  selector:
    app.kubernetes.io/name: jellyfin
    app.kubernetes.io/component: database  # <- CRITICAL
  ports:
    - port: 5432

Without the component label, the Kubernetes Service matches all pods with app.kubernetes.io/name: jellyfin — which includes both the app and the database. Traefik load-balances across both, and half your HTTP requests hit a PostgreSQL socket that responds with incomprehensible binary.

The NFS Shared Config

The Ugreen DXP4800 NAS serves NFS shares for media over a 5Gbps LACP bond — dual 2.5GbE ports aggregated through a Ubiquiti USW Aggregation 10GbE switch. The Proxmox hosts connect to the same switch via Mellanox ConnectX-3 NICs at full 10GbE, so NFS reads from the NAS to any k3s node saturate the NAS uplink, not the node. I added a new share for Jellyfin config:

/volume1/k3s-jellyfin/config    → NFS RWX mount at /config on both pods
/volume1/media                  → NFS RO mount at /media (existing)

Why NFS for config instead of another option:

OptionProblem
Longhorn RWXLonghorn’s RWX support requires an additional NFS provisioner sidecar. Adds complexity for a simple file share.
CephFSNot deployed in this cluster. Would require a completely new storage layer.
ConfigMapConfig files change at runtime. ConfigMaps are immutable once mounted.
NFSAlready running on the NAS. Zero additional infrastructure. 5Gbps aggregate throughput.

What Gets Written to Config at Runtime

Not everything in /config is read-only. I audited which files Jellyfin modifies during operation:

FileWhen ModifiedHA Impact
system.xmlAdmin changes settings via UIBoth pods read same config — works
logging.default.jsonLog level changed via APIBoth pods read same config — works
data/plugins/Plugin installed via UIInstall on one pod, restart both to load
data/collections/User creates a collectionNFS handles concurrent writes
metadata/Library scan pulls imagesNon-exclusive writes — NFS handles this

The risk: two pods writing to the same XML file simultaneously. In practice, admin config changes happen through the UI (which sticky sessions route to one pod), and library scans use file-level locking that NFS supports.

The SQLite-to-PostgreSQL Migration Tool

This is the most delicate part of the entire project. Over a year of playback history, user preferences, watched states, and media metadata lives in SQLite. Losing any of it means angry family members.

The migration tool is a standalone .NET console application:

dotnet run --project tools/JellyfinMigrator -- \
  --source "/config/data/jellyfin.db" \
  --target "Host=jellyfin-postgres;Database=jellyfin;Username=jellyfin;Password=..." \
  --batch-size 1000 \
  --dry-run

Migration Strategy

  1. Stop Jellyfin — scale the StatefulSet to 0 replicas
  2. Backup SQLite — copy all .db files to a safe location
  3. Run migrations — create the PostgreSQL schema via EF Core migrations
  4. Run the migrator — read from SQLite, batch-insert into PostgreSQL
  5. Verify counts — compare row counts across all 29 tables
  6. Start Jellyfin — scale back to 1 replica pointing at PostgreSQL
  7. Validate — check that playback positions, favorites, and user data are intact

The Batch Insert Pattern

SQLite tables can have millions of rows (a large library with thousands of items generates millions of ItemValues and MediaStreams rows). The migrator reads in batches of 1,000 and uses COPY (PostgreSQL’s bulk insert) for performance:

await using var writer = await connection.BeginBinaryImportAsync(
    "COPY \"BaseItems\" (\"Id\", \"Type\", \"Name\", ...) FROM STDIN (FORMAT BINARY)");

foreach (var item in batch)
{
    await writer.StartRowAsync();
    await writer.WriteAsync(item.Id, NpgsqlDbType.Uuid);
    await writer.WriteAsync(item.Type, NpgsqlDbType.Text);
    await writer.WriteAsync(item.Name, NpgsqlDbType.Text);
    // ... 30+ columns
}

await writer.CompleteAsync();

COPY is 10-100x faster than individual INSERT statements for bulk operations. The entire migration (a moderate-sized library with ~5,000 items, ~50,000 item values, and ~100,000 media streams) completes in under 60 seconds.

Data Type Conversions

The migrator handles several SQLite-to-PostgreSQL type mappings:

SQLite TypePostgreSQL TypeConversion
TEXT GUIDUUIDGuid.Parse()
INTEGER booleanBOOLEANvalue != 0
TEXT ISO 8601TIMESTAMPTZDateTimeOffset.Parse() with UTC
REAL doubleDOUBLE PRECISIONDirect
BLOBBYTEADirect
TEXT JSONJSONBDirect (preserves structure)

The most common failure mode in testing: SQLite stores NULL timestamps as empty strings, not actual NULL values. The migrator explicitly handles this: string.IsNullOrEmpty(value) ? null : DateTimeOffset.Parse(value).

The StatefulSet Conversion

With PostgreSQL running and data migrated, the Jellyfin Deployment becomes a StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: jellyfin
  namespace: jellyfin
spec:
  serviceName: jellyfin
  replicas: 1  # Start with 1, scale to 2 in Phase 4
  selector:
    matchLabels:
      app.kubernetes.io/name: jellyfin
      app.kubernetes.io/component: web
  template:
    spec:
      containers:
        - name: jellyfin
          image: 855878721457.dkr.ecr.us-east-1.amazonaws.com/k3s-homelab/jellyfin-ha:latest
          env:
            - name: JELLYFIN_DatabaseProvider
              value: "Jellyfin-PostgreSQL"
            - name: JELLYFIN_ConnectionStrings__Jellyfin-PostgreSQL
              valueFrom:
                secretKeyRef:
                  name: jellyfin-postgres-credentials
                  key: connection-string
          volumeMounts:
            - name: config
              mountPath: /config
            - name: media
              mountPath: /media
              readOnly: true
            - name: transcode
              mountPath: /data/transcode
            - name: cache
              mountPath: /data/cache
      volumes:
        - name: config
          nfs:
            server: 192.168.1.100
            path: /volume1/k3s-jellyfin/config
        - name: media
          nfs:
            server: 192.168.1.100
            path: /volume1/media
  volumeClaimTemplates:
    - metadata:
        name: transcode
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 50Gi
    - metadata:
        name: cache
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 10Gi

Key differences from a Deployment:

  • volumeClaimTemplates creates unique PVCs per pod: transcode-jellyfin-0, transcode-jellyfin-1
  • serviceName creates stable DNS: jellyfin-0.jellyfin.jellyfin.svc.cluster.local
  • Pod identity is stable across rescheduling — jellyfin-0 keeps its PVCs even when moved to a different node

S3 Backup CronJob

PostgreSQL data is too important for Longhorn replication alone. A CronJob dumps the database to S3 nightly:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: jellyfin-postgres-backup
  namespace: jellyfin
spec:
  schedule: "0 3 * * *"  # 3 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: postgres:16-alpine
              command:
                - /bin/sh
                - -c
                - |
                  pg_dump -h jellyfin-postgres -U jellyfin jellyfin | \
                  gzip > /tmp/jellyfin-$(date +%Y%m%d).sql.gz && \
                  aws s3 cp /tmp/jellyfin-*.sql.gz \
                    s3://k3s-homelab-backups/jellyfin/
          restartPolicy: OnFailure

The backup runs at 3 AM, dumps the entire database, compresses it, and uploads to S3. Retention is managed by S3 lifecycle rules: 30 days of daily backups, then monthly snapshots for 1 year.


Coming Up Next

Tomorrow: state externalization and the sticky session compromise — what we did with those 11 ConcurrentDictionary caches and why we chose pragmatism over perfection.

Browse the code: The full Jellyfin fork — including the PostgreSQL provider, Dockerfile, and CI pipeline — is public at github.com/zolty-mat/jellyfin. The Kubernetes manifests will follow once the infrastructure repo is cleaned up.

Cloud alternative: Instead of running PostgreSQL yourself, DigitalOcean Managed Databases handles backups, failover, and upgrades automatically.