Jellyfin

Securing Jellyfin when it's exposed to the internet

TL;DR Someone asked me on Reddit for a comprehensive guide to securing a public-facing Jellyfin instance, so here it is. The short answer I gave was: fail2ban, automate patching, implement OAuth, and download an IP block list. This post expands all four into actionable steps and adds a fifth option — IP whitelisting with a DDNS-aware Python cron job — plus the honest answer that a VPN eliminates most of this complexity entirely. ...

Stress Testing GPU Transcoding in Kubernetes with JF_hw_stress

TL;DR JF_hw_stress is a headless transcoding stress tester that answers one question: how many concurrent transcode streams can your GPU actually handle before quality degrades? It runs escalating FFmpeg transcodes against real media files using VAAPI hardware acceleration, measures FPS ratios, and outputs a JSON report. I run it as a Kubernetes Job on the same k3s cluster from Cluster Genesis, scheduled exclusively on the GPU node (Intel UHD 630). The job auto-deletes after 10 minutes so it does not accumulate stale pods. ...

Jellyfin HA on Kubernetes: Redis-Backed Transcode Session Failover

TL;DR Jellyfin dies mid-stream when a Kubernetes pod restarts because all transcode state is in-memory. I forked it, added a Redis-backed ITranscodeSessionStore, and wired in atomic lease-based pod takeover. The fork is at github.com/ZoltyMat/jellyfin-ha, and I also published a repo-level diff document at docs/FORK-DIFF.md showing exactly what changed versus upstream Jellyfin. Single-instance deployments need zero config changes because it falls back to a no-op store transparently. The Problem Jellyfin is great. It’s also built with the assumption that exactly one server instance is running at a time. Transcode state — which pods are running FFmpeg, what segments have been written, who owns a given play session — lives entirely in memory. When the process dies, that state is gone. ...

What's Still Broken and What Comes Next

TL;DR Over the last six posts, I’ve documented converting Jellyfin from a single-process media server into a two-replica, PostgreSQL-backed, sticky-session-coordinated deployment on k3s. Five of six failover tests passed cleanly. The key result: zero-downtime failover — killing a pod doesn’t take down the service. Users on the surviving replica see no interruption; displaced users reconnect in seconds. Node maintenance no longer kills Jellyfin for the household. But this project isn’t finished, and some problems can’t be solved with this architecture. This final post is an honest inventory of what’s still broken, what was deferred, and what the path forward looks like. ...

Scaling to Two Replicas and Failover Testing

TL;DR This is the moment everything was built for. Three phases of preparation — PostgreSQL provider (Day 3), storage migration (Day 4), state externalization (Day 5) — all leading to a single kubectl scale command. This post covers Phase 4: scaling the Jellyfin StatefulSet to 2 replicas, configuring anti-affinity to spread pods across nodes, running six structured failover tests, building Prometheus alerts, and one test that only partially passed. The headline result: killing a pod causes zero service downtime — users on the surviving replica experience no interruption at all, and displaced users reconnect within seconds. ...

Jellyfin state externalization architecture

State Externalization and the Sticky Session Compromise

TL;DR Phase 3 is where the rubber meets the road. We have PostgreSQL for persistent data (Day 4) and NFS for shared config. But Jellyfin still holds critical runtime state — sessions, users, devices, tasks — in 11 ConcurrentDictionary instances scattered across singleton managers. Two pods with independent memory spaces means two independent views of reality. This post covers the state externalization decision: what got moved to Redis, what got solved by sticky sessions, what got disabled entirely, and why pragmatism beat perfection for a homelab media server. ...

Storage Refactoring and the SQLite-to-PostgreSQL Migration

TL;DR Phase 2 is the scariest phase. It’s where we take a running Jellyfin instance with years of playback history, user preferences, and media metadata — then swap the database from SQLite to PostgreSQL and restructure every volume. One wrong move and the family discovers their “Continue Watching” list is gone. This post covers deploying PostgreSQL as a k3s StatefulSet, restructuring Jellyfin’s volume layout from a monolithic RWO PVC to NFS shared config + Longhorn per-pod storage, and building a SQLite-to-PostgreSQL migration tool. ...

Jellyfin PostgreSQL database provider architecture

Forking Jellyfin: A PostgreSQL Database Provider in .NET 10

TL;DR Jellyfin stores everything in SQLite. Metadata, users, activity logs, authentication — all of it lives in .db files that lock under concurrent access. To run multiple replicas, we need a real network-accessible database. This post covers Phase 1 of the HA conversion: forking Jellyfin, designing a pluggable database provider interface, implementing it for PostgreSQL with Npgsql, generating EF Core migrations, writing integration tests with Testcontainers, and building a custom Docker image. ...

Multi-model AI planning workflow diagram

Multi-Model Planning: The Same Pattern That Shipped dnd-multi

TL;DR The Jellyfin HA conversion touches a .NET 10 codebase, Entity Framework Core migrations, Kubernetes manifests, Terraform infrastructure, PostgreSQL operations, and FFmpeg transcoding pipelines. No single AI model understands all of this equally well. So I used four of them — the same multi-model planning pattern that shipped dnd-multi in a single day and that I documented in the LLM GitHub PR workflow. This post covers how I adapted that pattern for infrastructure work, what each model caught, and why planning is where all the human time should go. ...

Jellyfin single-instance architecture diagram

Why Jellyfin Can't Scale (And What We're Going to Do About It)

TL;DR Jellyfin is a fantastic open-source media server. It is also, architecturally, a single-process application that assumes it’s the only instance running. SQLite as the database. Eleven ConcurrentDictionary caches holding sessions, users, devices, and task queues in memory. A file-based config directory that gets written to at runtime. None of this survives a second pod. This is the first post in a seven-part series documenting how I converted Jellyfin into a highly available, multi-replica deployment on my home k3s cluster. The project spans two repositories, four phases, ~20 GitHub Issues executed by AI agents, and a live failover demo where I killed a pod and the service continued with zero downtime — users on the surviving replica never saw an interruption. ...