K3s stability improvements

Two Months of K3s Stability Improvements

TL;DR Over the past two months, I have made a series of stability improvements to my k3s homelab cluster. The biggest wins: migrating from AWS ECR to self-hosted Harbor (eliminating 12-hour token expiry), fixing recurring Grafana crashes caused by SQLite corruption on Longhorn, recovering pve4 after a failed LXC experiment, hardening NetworkPolicies to close gaps in pod-to-host traffic rules, and patching multiple CVEs across the media stack. The cluster now runs 7/7 nodes on k3s v1.34.4, all services monitored, all images pulled from Harbor with static credentials that never expire. ...

March 27, 2026 · 8 min · zolty
VPN tech collective mesh

Building a VPN Mesh for a Tech Collective

TL;DR I am designing a WireGuard VPN mesh to connect a small tech collective – a group of friends who each run their own infrastructure. The topology is hub-and-spoke with my k3s cluster as the hub, connecting 4+ remote sites over encrypted tunnels. Shared services include Jellyfin media federation, distributed CI/CD runners, LAN gaming, and centralized monitoring. The logging pipeline is privacy-first: all log filtering and anonymization happens at the edge (spoke side) before anything ships to the hub. This post covers the network design, the three-layer firewall architecture, the privacy model, and the phased rollout plan. ...

March 27, 2026 · 8 min · zolty
Authentik identity platform

Planning Authentik: Centralized Identity for a Homelab

TL;DR I am deploying Authentik as a centralized identity provider for my k3s cluster. It replaces the current OAuth2 Proxy setup with proper SSO, federates Google as a social login source, and introduces group-based RBAC (admins, writers, readers) across all services. The migration is phased – public services first via Traefik forwardAuth, then internal services via native OIDC, then proxy-protected apps that have no OIDC support. OAuth2 Proxy stays in git for instant rollback. This post covers the architecture, the user model, the edge security design, and the gotchas I expect to hit. ...

March 27, 2026 · 7 min · zolty
CI/CD pipeline for blog deployment on k3s

This Blog Deploys Itself: Self-Hosted CI/CD on k3s with GitHub ARC

TL;DR The blog is deployed by GitHub Actions runners running inside the same k3s cluster it’s talking about. A push to main with content under hugo/ triggers a build, a two-pass S3 sync, and a CloudFront invalidation. A daily 06:00 UTC cron handles future-dated posts so I can commit a backlog and let them drip out on schedule. After every successful deploy, a Playwright job kicks off and scans the live site for broken links, visual regressions, and security header compliance. The whole thing runs on eight self-hosted amd64 runners managed by GitHub’s Actions Runner Controller (ARC) in the cluster. Not a single managed CI minute gets billed. ...

March 26, 2026 · 7 min · zolty
Linkerd service mesh evaluation for homelab Kubernetes

Linkerd Service Mesh: Why I'm Not Deploying It Yet (But Have a Plan Ready)

TL;DR I spent time evaluating Linkerd — the CNCF graduated service mesh — for my homelab k3s cluster. The conclusion: it’s an impressive piece of engineering with genuinely useful features like automatic mTLS, post-quantum cryptography, and per-service observability. But for a cluster with ~20 workloads and a single operator, the operational overhead outweighs the benefits today. I’ve written a complete deployment plan so I can adopt it quickly when the cluster grows to the point where it makes sense. ...

March 24, 2026 · 8 min · zolty
Harbor container registry

Ditching AWS ECR for Self-Hosted Harbor: Why and How

TL;DR AWS ECR tokens expire every 12 hours. Every time the cron job that refreshes the pull secret fails, image pulls break cluster-wide. Docker Hub’s anonymous rate limit (100 pulls/6 hours) started hitting during CI builds that pull nginx:alpine and python:3.12-slim. I replaced both with self-hosted Harbor for container images and Gitea for package registries (PyPI, npm), backed by NFS on the NAS, deployed via Ansible and Helm, with Trivy vulnerability scanning on push. Thirteen CI workflows were updated in a single commit. Pull secrets never expire. Images never rate-limit. Monthly ECR cost drops to zero. ...

March 21, 2026 · 5 min · zolty
PETG filament temperature settings

PETG Filament Settings: Why the Advertised Temperatures Are Wrong

TL;DR PETG is the go-to filament for functional homelab parts — heat resistant, mechanically strong, and chemically stable. The advertised temperature ranges on most PETG spools (230-250C nozzle) are too low. After systematic testing, the settings that actually produce strong, well-bonded prints on the Bambu Lab P1S are 265C nozzle and 80C bed, with first layer at 270C/85C. These temperatures are 15-35C hotter than what the manufacturer label suggests. Amazon came through with a 4KG bulk delivery that arrived the same day the previous spool ran out, which was cutting it closer than I would like. ...

March 20, 2026 · 9 min · zolty
Bambu Lab P1S 3D printing for homelab

The Bambu Lab P1S: Why Every Homelab Needs a 3D Printer

TL;DR I added a Bambu Lab P1S to the homelab and it has become one of the highest-value additions to the setup. Print quality out of the box is near injection-mold level for functional parts. I have already printed ventilated node enclosures, SFP+ cable routing brackets, custom rack shelves, and equipment mounts. Setup took under 30 minutes from unboxing to the first print. The ability to prototype custom hardware solutions in hours instead of waiting days for shipped parts changes how you approach infrastructure problems. ...

March 19, 2026 · 8 min · zolty
Jellyfin failover testing on k3s

Scaling to Two Replicas and Failover Testing

TL;DR This is the moment everything was built for. Three phases of preparation — PostgreSQL provider (Day 3), storage migration (Day 4), state externalization (Day 5) — all leading to a single kubectl scale command. This post covers Phase 4: scaling the Jellyfin StatefulSet to 2 replicas, configuring anti-affinity to spread pods across nodes, running six structured failover tests, building Prometheus alerts, and one test that only partially passed. The headline result: killing a pod causes zero service downtime — users on the surviving replica experience no interruption at all, and displaced users reconnect within seconds. ...

March 11, 2026 · 10 min · zolty
Jellyfin storage architecture diagram

Storage Refactoring and the SQLite-to-PostgreSQL Migration

TL;DR Phase 2 is the scariest phase. It’s where we take a running Jellyfin instance with years of playback history, user preferences, and media metadata — then swap the database from SQLite to PostgreSQL and restructure every volume. One wrong move and the family discovers their “Continue Watching” list is gone. This post covers deploying PostgreSQL as a k3s StatefulSet, restructuring Jellyfin’s volume layout from a monolithic RWO PVC to NFS shared config + Longhorn per-pod storage, and building a SQLite-to-PostgreSQL migration tool. ...

March 9, 2026 · 8 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.