What's Still Broken and What Comes Next
TL;DR Over the last six posts, I’ve documented converting Jellyfin from a single-process media server into a two-replica, PostgreSQL-backed, sticky-session-coordinated deployment on k3s. Five of six failover tests passed cleanly. The key result: zero-downtime failover — killing a pod doesn’t take down the service. Users on the surviving replica see no interruption; displaced users reconnect in seconds. Node maintenance no longer kills Jellyfin for the household. But this project isn’t finished, and some problems can’t be solved with this architecture. This final post is an honest inventory of what’s still broken, what was deferred, and what the path forward looks like. ...