Langfuse tracing and cost dashboards for autonomous LLM agents Langfuse tracing and cost dashboards for autonomous LLM agents

Tracing and budgeting LLM agents with Langfuse

TL;DR I run unattended LLM agents on my homelab — they write code, open MRs, generate content, rotate secrets. The problem: they fail silently and bill silently. Langfuse (a tracing platform) logs every LLM call with input/output tokens, latency, and cost. On top of those traces, I built three background monitors that run weekly: a goal-drift detector that compares an agent’s stated objective to what its commits actually did (via embedding similarity), a cost-spike alert that fires at 80% and 100% of a daily budget cap, and an action audit that exports traces and flags sessions where the tool-call sequence diverged from the plan. Together, these let me sleep while autonomous agents handle repetitive work. ...

June 13, 2026 · 11 min · zolty
A LiteLLM gateway routing many model providers behind one OpenAI-compatible endpoint A LiteLLM gateway routing many model providers behind one OpenAI-compatible endpoint

A LiteLLM gateway for the homelab: one endpoint, many models, hard cost caps

TL;DR I put a LiteLLM proxy gateway in front of every LLM I use — local Ollama models for bulk/cheap classification work, OpenRouter for frontier models when I need them, plus cloud vendors if needed. Every app and agent targets one OpenAI-compatible endpoint. Per-key budgets and daily spend alerts make runaway costs impossible. I define model-to-backend mappings in YAML, let LiteLLM handle the routing, and route based on intent: ask for solar-expert when I need a domain-specific Q&A bot backed by a small local model, ask for claude-opus-4-8 when I need real reasoning. The gateway cost? ~50ms latency overhead and one Kubernetes Deployment. The gain? No more vendor SDK sprawl, no more guessing which model is wired into a cron job, and spend visibility that I actually trust. ...

June 12, 2026 · 9 min · zolty
Power meter and heat-flow diagram for a homelab rack Power meter and heat-flow diagram for a homelab rack

Watts, BTUs, and the real cost of running a homelab 24/7

TL;DR A homelab feels free until you read the meter. After a year of running seven k3s nodes plus a pair of Mac Studios under whatever workload I felt like throwing at them, I sat down with a Kill-a-Watt and worked out what the cluster actually costs to keep on. Idle is genuinely cheap. Sustained LLM inference is not. The honest break-even against cloud inference is workload-shaped, and for my workloads, on-prem wins — but only because I run them often enough to amortize the wattage. The numbers below are mine; substitute your electricity rate to get yours. ...

May 14, 2026 · 7 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.