Devops

Container smoke testing with Goss: stop guessing if your .env works

TL;DR Goss tests the image, not the running container. Use dgoss run --env-file .env to inject your environment and then assert in three layers: does the var exist, is it non-empty, and does it actually authenticate? That layering tells you exactly where the chain breaks instead of just “MySQL connection failed.” The problem I saw someone in a DevOps forum wrestling with this exact thing. They were manually debugging why their .env values weren’t translating properly into MySQL credentials, and had turned to Goss to automate sanity checks. Two questions tripped them up: ...

Running AWS Lens as a Self-Hosted Web App on k3s

TL;DR AWS Lens is an open-source Electron desktop app for managing AWS resources — EC2, S3, Lambda, IAM, Cost Explorer, and more. I wanted it accessible from my browser without running a desktop app. I adapted it to run as a containerized Express server on k3s, fixed a class of runtime crashes from the Electron-to-web adapter, hardened it against three security issues, and deployed it behind Traefik and Let’s Encrypt. The changes are open-source in BoraKostem/AWS-Lens#21. ...

Forking Wiki.js to Get Mermaid 11: When Upstream Won't Move

TL;DR Wiki.js 2.x ships Mermaid 8.8.2, released in 2020. Mermaid 11 — the current stable release — adds timeline diagrams, improved gitGraph, better theming, and fixes years of rendering bugs. The upstream project defers this upgrade to Wiki.js v3, which has no release date. The PR queue has sat idle for over a year. I forked requarks/wiki at tag v2.5.312, upgraded Mermaid in-place, patched 8 CVEs including one Critical SAML authentication bypass, reduced the vulnerability count from 8 Critical / 48 High to 3 Critical / 42 High, and deployed it to the cluster. The fork stays close to upstream — Vue 2 and Webpack 4 are untouched. Only the Mermaid surface and security dependencies are modified. ...

CI/CD pipeline for blog deployment on k3s

This Blog Deploys Itself: Self-Hosted CI/CD on k3s with GitHub ARC

TL;DR The blog is deployed by GitHub Actions runners running inside the same k3s cluster it’s talking about. A push to main with content under hugo/ triggers a build, a two-pass S3 sync, and a CloudFront invalidation. A daily 06:00 UTC cron handles future-dated posts so I can commit a backlog and let them drip out on schedule. After every successful deploy, a Playwright job kicks off and scans the live site for broken links, visual regressions, and security header compliance. The whole thing runs on eight self-hosted amd64 runners managed by GitHub’s Actions Runner Controller (ARC) in the cluster. Not a single managed CI minute gets billed. ...

Five Projects in One Day: What AI Pair Programming Actually Looks Like

TL;DR On March 21, I shipped meaningful work across five repositories in a single day: a 13,674-line stock trading platform from scratch, a Harbor container registry replacing AWS ECR across 13 CI workflows, API key authentication and an HA proxy for digital signage, inventory sell signals for a trading card tracker, and an OpenClaw cost optimization that killed an idle token burn. Every commit was co-authored with Claude. This post breaks down the mechanics of how that actually works – the prompting patterns, the failure modes, the things I would not let the AI do, and the real throughput multiplier. ...

Ditching AWS ECR for Self-Hosted Harbor: Why and How

TL;DR AWS ECR tokens expire every 12 hours. Every time the cron job that refreshes the pull secret fails, image pulls break cluster-wide. Docker Hub’s anonymous rate limit (100 pulls/6 hours) started hitting during CI builds that pull nginx:alpine and python:3.12-slim. I replaced both with self-hosted Harbor for container images and Gitea for package registries (PyPI, npm), backed by NFS on the NAS, deployed via Ansible and Helm, with Trivy vulnerability scanning on push. Thirteen CI workflows were updated in a single commit. Pull secrets never expire. Images never rate-limit. Monthly ECR cost drops to zero. ...

AI coding governance framework for engineering teams

Governing AI Coding Tools Across an Engineering Team

TL;DR AI coding tools are now default behavior for most developers, not an experiment. If you manage a team and you haven’t formalized this, you have ungoverned spend, security exposure, and inconsistent behavior happening right now. The fix isn’t to take the tools away — it’s to pick one, pay for it centrally, encode your policies into the AI itself using instruction files and skills, and govern the control folder rather than individual usage. Here’s the framework I’d implement. ...

When the AI Breaks Production: Failure Patterns, Guardrails, and Measuring What Works

TL;DR AI tools have caused multiple production incidents in this cluster. The AI alert responder agent alone generated 14 documented failure patterns before it became reliable. A security scanner deployed by AI applied restricted PodSecurity labels to every namespace, silently blocking pod creation for half the applications in the cluster. The service selector trap – where AI routes 50% of requests to PostgreSQL instead of the application – appeared in 4 separate incidents before guardrails stopped it. This post catalogs the failure patterns, the five-layer guardrail architecture built to prevent them, and an honest assessment of what still goes wrong. ...

Two AIs managing a GitHub repository via issues and pull requests

Two AIs, One Codebase: Using Local Copilot to Direct GitHub Copilot via Issues and PRs

TL;DR A 109-day project plan. One day of actual work. Eight hours of active pipeline time. The key was treating planning and implementation as two separate AI-driven phases: spend an evening getting the plan right by routing it through multiple models, then let Claude Sonnet 4.6 implement it autonomously overnight via GitHub Copilot’s cloud agent while you sleep. This is the full playbook — planning phase included. The Project This came out of building dnd-multi, a full-stack AI Dungeon Master platform: FastAPI backend, Next.js 15 frontend, a Discord bot, LiveKit voice, and AWS Bedrock integration. Seven feature phases, a plan projected to take until June 19. ...

Reference: k3s Homelab — AI Lessons Learned

Context: This is the live docs/ai-lessons.md from the home k3s cluster repository, referenced extensively across posts on this blog — starting with AI Memory System and GitHub Copilot Setup Guide. Every entry exists because its absence caused a production incident. Personal identifiers and internal domains have been replaced with generic placeholders. Updated: 2026-03-03 Rules discovered through production breakage. Each entry prevents recurrence of a specific failure. Update this file whenever a new non-obvious failure pattern is discovered. ...