TL;DR

On March 21, I shipped meaningful work across five repositories in a single day: a 13,674-line stock trading platform from scratch, a Harbor container registry replacing AWS ECR across 13 CI workflows, API key authentication and an HA proxy for digital signage, inventory sell signals for a trading card tracker, and an OpenClaw cost optimization that killed an idle token burn. Every commit was co-authored with Claude. This post breaks down the mechanics of how that actually works – the prompting patterns, the failure modes, the things I would not let the AI do, and the real throughput multiplier.

The Day’s Output

RepositoryWhat ShippedLines ChangedCommits
stock_automationFull platform: 5 phases, data layer through paper trading~13,67419
home_k3s_clusterHarbor + Gitea registries, 13 CI workflow updates, monitoring~2,0006
digital_signageHA proxy (5 endpoints), API key auth, 37 tests, CI fixes~1,50013
cardboardInventory lots, sales tracking, sell signals engine~8003
home_k3s_clusterOpenClaw idle token burn fix, lean context, kill autonomous loop~2001

Total: roughly 18,000 lines of code across 42 commits in 5 repositories. Every commit has Co-Authored-By: Claude Opus 4.6 in the trailer.

How It Actually Works

AI pair programming is not “type a prompt, get a project.” The workflow that produces reliable output looks more like this:

1. Front-load the Context

Every repository in this cluster has a .github/copilot-instructions.md that describes the architecture, conventions, and known anti-patterns. Claude skills (stored in .claude/skills/) encode domain-specific knowledge: how to deploy to k3s, how to build Docker images for this cluster, how to write Terraform for this AWS account.

The stock_automation project started with 7 Claude skill files covering data science, trading strategies, technical analysis, SEC filings, transcript analysis, sentiment analysis, and backtesting. These skills contained reference implementations, coding patterns, and operating rules (Decimal for money, Pydantic for models, caching mandatory). The skills were written before any business logic. They function as a specification that the AI implements against.

2. Phase the Work

Large projects get broken into phases with explicit deliverables. Stock automation had five phases, each with a planning document, implementation, PR, and code review. The AI generates code for one phase, I review the PR, request fixes for anything that looks wrong, merge, then start the next phase.

This is critical. Asking for “build me a stock trading platform” produces garbage. Asking for “implement Phase 1: data providers, normalizer, cache layer, two strategies, and a backtesting engine – here is the schema, here are the constraints, here are the tests I expect” produces usable code.

3. Review Everything

Every PR gets reviewed. Not skimmed – reviewed. The stock_automation Phase 1 PR had a review that caught and fixed critical and high severity issues before merge. Bare except clauses were narrowed to specific exceptions. Missing type annotations were added. A signal model validator was missing price consistency checks (stop_loss must be below entry for long positions). These are exactly the kinds of bugs that AI code introduces – structurally correct but semantically wrong.

The review step is non-negotiable. AI-generated code that passes lint and tests can still be wrong in ways that only a human with domain knowledge will catch.

4. Know What to Delegate

Not everything should be AI-generated. The decision about what to build, the architecture, the data model design, the strategy logic – those are human decisions informed by domain knowledge. The AI excels at:

  • Translating a specification into implementation code
  • Writing test cases from a description of expected behavior
  • CI/CD pipeline configuration (YAML is tedious but deterministic)
  • Boilerplate reduction (Pydantic models, CLI argument parsing, database schemas)
  • Finding and fixing lint/type errors across a codebase

The AI is bad at:

  • Knowing whether a strategy should exist at all
  • Making cost/benefit tradeoffs (it will happily add ML dependencies for marginal improvement)
  • Understanding operational context (it does not know your ECR tokens expire)
  • Security decisions (it will add auth if asked, but it will not notice auth is missing)

Failure Modes

The Service Selector Trap (Again)

This documented anti-pattern appeared again in the digital signage namespace. When the AI writes a Kubernetes Service for an application that shares a namespace with PostgreSQL, it uses app.kubernetes.io/name as the only selector – matching both the app and the database. Despite this bug being documented in four separate instruction files, the AI still produces it occasionally. The fix is always the same: add app.kubernetes.io/component: web to the selector.

Docker Hub Rate Limits in CI

The AI-generated CI workflows pulled base images from Docker Hub without authentication. With 8 parallel ARC runners, the anonymous rate limit was exhausted within hours. The fix required three iterations: first adding Docker Hub login, then mirroring base images to ECR, then hardcoding the ECR registry in Dockerfiles. The AI generated each fix correctly when prompted, but it did not anticipate the rate limit problem – it had to be told.

Over-Engineering the ML Layer

Stock automation Phase 4 included a scikit-learn classifier for market regime detection. I let the AI build it because the implementation was clean and well-tested. But the simple statistical approach (rolling volatility thresholds) works just as well for daily-bar swing trading and requires zero ML dependencies. This is a recurring pattern: the AI will build the more complex solution if you let it, because it does not have opinions about complexity budgets.

The Throughput Multiplier

The honest throughput multiplier is roughly 3-5x, not 10x. Here is where the time goes:

Activity% of Time
Writing prompts and specifications20%
Reviewing generated code30%
Fixing issues found in review15%
Debugging CI/CD and deployment20%
Testing and validation15%

The AI eliminates the typing bottleneck but does not eliminate the thinking bottleneck. The 20% spent writing specifications is the most valuable time in the process – a precise spec produces code that needs minimal fixes. A vague spec produces code that compiles but does the wrong thing.

The 30% review time is irreducible. Every line of AI-generated code needs the same scrutiny as code from a junior developer on their first week. The code is syntactically correct, passes linters, and often passes tests – but it may encode incorrect assumptions about the domain.

What I Would Not Let the AI Do

  • Choose the architecture. The four-layer design for stock_automation, the decision to use parquet instead of a database, the human-in-the-loop constraint – these are human decisions.
  • Write the trading strategies. The AI implemented momentum and mean reversion from my specifications, but the specifications encode my investing thesis. AI does not have an investing thesis.
  • Make security decisions. The API key auth system was my decision after noticing the digital signage endpoints were unauthenticated. The AI implemented it, but it did not flag the missing auth.
  • Push to production without review. Every PR was reviewed, every manifest was read before kubectl apply, every Terraform plan was inspected before apply.

The Meta-Lesson

Five projects in one day is possible because the context infrastructure exists: instruction files, skill files, planning documents, and anti-pattern catalogs. The AI is executing against a well-defined specification, not improvising. The day’s output was 18,000 lines of code, but the specifications, constraints, and conventions that made that output reliable were built over the previous month.

The AI is a force multiplier for an engineer who knows what to build. It is not a substitute for knowing what to build.