Monitoring goes blind — Longhorn storage corruption incident report

When Monitoring Goes Blind: A Longhorn Storage Corruption Incident

TL;DR Grafana went completely dark for about 26 hours on my home k3s cluster. Two things broke simultaneously: Loki entered CrashLoopBackOff, and Prometheus silently stopped ingesting metrics — its pods showed as healthy and 2/2 Running the whole time. The actual cause was Longhorn’s auto-balancer migrating replicas onto a freshly-added cluster node (k3s-agent-4) that had unstable storage during its first 48 hours. The replica I/O errors propagated directly into the workloads, corrupting mid-write files: a Prometheus WAL segment and a Loki TSDB index file. Both required offline surgery via a busybox pod to delete the corrupted files before the services could recover. ...

February 25, 2026 · 8 min · zolty
Monitoring stack

Monitoring Everything: Prometheus, Grafana, and Loki on k3s

TL;DR After running the cluster for nearly two weeks, today I took a step back to document and optimize the monitoring stack. This covers kube-prometheus-stack (Prometheus + Grafana + AlertManager), Loki for log aggregation, custom dashboards for every service, alert tuning to reduce noise, and the cluster-wide performance benchmarks I ran to establish baseline metrics. The Monitoring Architecture ┌──────────────────────────────────────────────────┐ │ Grafana │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Metrics │ │ Logs │ │ Alerts │ │ │ │ Explorer │ │ Explorer │ │ Rules │ │ │ └──────┬───┘ └──────┬───┘ └──────┬───┘ │ └─────────┼──────────────┼─────────────┼───────────┘ │ │ │ ┌─────┴─────┐ ┌─────┴─────┐ │ │Prometheus │ │ Loki │ │ │ (metrics) │ │ (logs) │ │ └─────┬─────┘ └─────┬─────┘ │ │ │ ┌─────┴──────┐ ┌──────┴──────┐ ┌─────┴────┐ │AlertManager│ │ Exporters │ │Promtail │ │ → Slack │ │ node │ │(log │ └────────────┘ │ kube-state │ │ shipper) │ │ cAdvisor │ └──────────┘ │ custom │ └─────────────┘ kube-prometheus-stack The foundation is kube-prometheus-stack, deployed via Helm. This single chart installs: ...

February 19, 2026 · 6 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.