TL;DR

Busy week. Three CVE patches shipped on the same day. OpenClaw stabilized with OpenRouter support and a cost exporter. The Wiki.js fork with Mermaid 11 went live after clearing a Trivy scan. PiKey — a Raspberry Pi that pretends to be a Bluetooth keyboard — shipped as a side project. A self-hosted GitHub Actions cache server cut CI restore times from minutes to seconds. And a Reddit comment defending “I use Claude to manage my infrastructure” turned into five new blog posts and a documentation sprint.

Security: Three Patches in One Day

The week opened with a cluster of CVEs that all landed on the same day:

  • oauth2-proxy v7.14.3 — auth bypass fix. This one was the most critical: the previous version had a flaw that could allow unauthenticated access to protected routes under specific header conditions. Patched immediately.
  • Sonarr 4.0.17 — CVE-2026-30976. Sonarr runs in the media namespace with internal-only exposure, but CVEs in media stack services still get patched.
  • Jellyfin PostgreSQL 16.13-alpine — CVE-2026-2005. Stateful database containers get extra attention because a compromise there is a data loss event, not just a service disruption.

All three shipped in a single commit block and rolled out cleanly. The oauth2-proxy patch also triggered a round of testing on the Home Assistant companion app auth path — the HA fix from earlier in the week (bypassing Google OAuth for /api/ and /auth/ paths) needed to survive the oauth2-proxy upgrade. It did.

The Harbor CVE-2026-4404 was also marked resolved this week after the upgrade completed.

OpenClaw: Stabilization and OpenRouter

OpenClaw got several improvements this week:

  • OpenRouter models added — the model list now includes OR-hosted options alongside Anthropic direct and Bedrock. This matters for cost optimization: some workloads (long context, batch summarization) are cheaper on OpenRouter than Bedrock.
  • Default switched to Qwen Flash — faster and cheaper for the interactive chat use case. Claude Sonnet stays available but is no longer the default.
  • Cost exporter added — a Prometheus exporter scrapes per-model token usage and cost from the OpenClaw API, feeding a Grafana dashboard that shows spend by user and model over time.
  • CI deploy migrated from ECR to Harbor — OpenClaw was one of the last services still pushing to ECR. It’s now fully on Harbor, matching the rest of the cluster.

Honest reflection: after running OpenClaw for a week and watching the usage metrics, the Claude desktop app handles 90% of what I actually use it for. OpenClaw’s value is multi-user access and the Slack/Telegram channel integrations — not the web UI itself.

Wiki.js Fork: Mermaid 11 Ships

The Wiki.js fork is the most technically interesting thing that shipped this week. Upstream Wiki.js 2.x ships Mermaid 8.8.2 from 2020 and defers the upgrade to v3 with no ETA. I forked at v2.5.312, upgraded to Mermaid 11.13.0, patched 8 CVEs including a SAML auth bypass, and ran a 22-test Selenium regression suite across 10 diagram types before deploying.

The fork now runs at wiki.k3s.internal.zolty.systems with:

  • All 10 Mermaid diagram types rendering correctly (50 SVGs, 0 errors)
  • Criticals reduced from 8 → 3
  • Security headers via Traefik middleware
  • An MCP server that lets Claude Code write pages directly from terminal sessions

The auto-documenting wiki post explains the operational model — the AI writes, I read.

New Projects Shipped

PiKey — A Raspberry Pi Zero 2W that spoofs a Logitech K380 Bluetooth keyboard, jiggles the mouse, and auto-types LLM-generated text. Built in three implementations: Python (primary), Rust, and C. The use case is simulating human activity on machines that need to stay “active” without physical input. Full write-up here.

Jellyfin GPU stress tester — A headless Kubernetes Job that hammers the Intel UHD 630 VAAPI transcoder with escalating concurrent streams and outputs a JSON report. Useful for validating GPU passthrough stability after node changes. Write-up here.

GitHub Actions cache server — Deployed a self-hosted Actions cache server backed by NAS NFS storage. Cache restore on self-hosted ARC runners was hitting GitHub’s CDN, which is slow from the homelab. With the local server, restores dropped from 2-3 minutes to under 10 seconds for a typical Python dependency cache. Write-up here.

GitHub org exporter — A new Prometheus exporter that tracks GitHub org membership metrics. Small thing, but adds visibility into runner pool health and org-level activity alongside the existing GHA Dashboard.

Infrastructure: The Cleanup Round

Some weeks are feature weeks. This was partly a cleanup week:

  • Grafana dashboards — SQLite persistence disabled on the Grafana PVC after a WAL corruption incident. Dashboards are now fully ConfigMap-provisioned, so persistence is irrelevant.
  • Grafana regression test framework — automated tests that validate dashboard JSON against the Grafana API schema before deploying, catching silent rendering failures.
  • CronJob entrypoints — a batch of CronJobs were using bash as the entrypoint on alpine-based images that only ship sh. Fixed across the board.
  • Daily chore wheel workflow — a GitHub Actions workflow that runs a rotating set of cluster health checks, cert expiry scans, PVC usage checks, and dependency audits on a daily schedule.
  • Terraform — per-agent disk size override added to the VM provisioning module, enabling larger disks on nodes that need Longhorn capacity without modifying shared config.

Planning: What’s Next

Two plans moved forward this week without shipping:

AuthentikReplacing OAuth2 Proxy with a centralized identity platform. The plan is ready. Phased migration: internal services first, then external. The oauth2-proxy patch this week will be the last time I touch it before ripping it out.

VPN mesh for a tech collectiveDesigning a WireGuard hub-and-spoke mesh to connect multiple people’s homelab nodes into a shared network: Jellyfin access, CI/CD runners, game servers. In design phase.

LinkerdDecided not yet. Twenty workloads and internal-only traffic doesn’t justify the operational overhead of a service mesh. The plan document is ready when the cluster is bigger.

The Reddit Thread

A comment on r/Terraform defending “I use Claude to manage infrastructure” turned into a small documentation sprint. The resulting Reddit reply needed receipts, which meant writing blog posts for things that existed but weren’t documented: the Jellyseerr natural language request system, the internal wiki setup, and the Wiki.js fork. Three posts in an afternoon.

The useful forcing function: if you can’t link to it, did you build it? The cluster has a lot of things running that exist only in the wiki and the git history. Writing the posts made me audit what was actually there versus what I thought was there. Two things I assumed were documented weren’t. They are now.

Commit Count

50+ commits to home_k3s_cluster this week across security patches, new features, CI fixes, and planning docs. 6 new blog posts before the documentation sprint. 3 more after. This is week eight since the cluster went from bare Proxmox hosts to production.