Traefik

Planning Authentik: Centralized Identity for a Homelab

TL;DR I am deploying Authentik as a centralized identity provider for my k3s cluster. It replaces the current OAuth2 Proxy setup with proper SSO, federates Google as a social login source, and introduces group-based RBAC (admins, writers, readers) across all services. The migration is phased – public services first via Traefik forwardAuth, then internal services via native OIDC, then proxy-protected apps that have no OIDC support. OAuth2 Proxy stays in git for instant rollback. This post covers the architecture, the user model, the edge security design, and the gotchas I expect to hit. ...

Scaling to Two Replicas and Failover Testing

TL;DR This is the moment everything was built for. Three phases of preparation — PostgreSQL provider (Day 3), storage migration (Day 4), state externalization (Day 5) — all leading to a single kubectl scale command. This post covers Phase 4: scaling the Jellyfin StatefulSet to 2 replicas, configuring anti-affinity to spread pods across nodes, running six structured failover tests, building Prometheus alerts, and one test that only partially passed. The headline result: killing a pod causes zero service downtime — users on the surviving replica experience no interruption at all, and displaced users reconnect within seconds. ...

Jellyfin state externalization architecture

State Externalization and the Sticky Session Compromise

TL;DR Phase 3 is where the rubber meets the road. We have PostgreSQL for persistent data (Day 4) and NFS for shared config. But Jellyfin still holds critical runtime state — sessions, users, devices, tasks — in 11 ConcurrentDictionary instances scattered across singleton managers. Two pods with independent memory spaces means two independent views of reality. This post covers the state externalization decision: what got moved to Redis, what got solved by sticky sessions, what got disabled entirely, and why pragmatism beat perfection for a homelab media server. ...

Upgrading k3s Across Five Minor Versions: v1.29 to v1.34 on a Homelab Cluster

TL;DR Upgraded a production k3s cluster from v1.29.0+k3s1 to v1.34.4+k3s1 across 8 nodes — 3 control plane servers, 4 amd64 worker agents, and 1 arm64 Lima VM agent. The upgrade stepped through every minor version (v1.29 → v1.30 → v1.31 → v1.32 → v1.33 → v1.34) with etcd snapshots between each step. Longhorn was upgraded from v1.6.0 to v1.8.2 in two stages (v1.7.3 as an intermediate step). SSH was broken to all cluster nodes, so the entire upgrade was done via Proxmox QEMU Guest Agent (qm guest exec) and Lima CLI (limactl shell). Discovered that k3s intentionally pins Traefik to v2.11.24 even when bundling Helm chart v27 — Traefik v3 migration is a separate effort. ...