The agent autonomy trust ladder: supervised → monitored → trusted → full

TL;DR

I run a growing fleet of autonomous agents — homelab ops, trading research, content generation. Most blow up the first few times they try anything new. I needed a way to decide what an agent is allowed to do without asking me, and what still requires a human checkpoint. The answer is a four-rung trust ladder — supervised, monitored, trusted, full autonomy. Agents earn rungs through track record, not promises. Demotions are possible and routine. The framework took the question “should this agent be allowed to do X” out of my head every single time and turned it into a policy I can apply consistently.

The problem

Autonomous agents have a binary failure mode: either they’re so locked-down that I’m reviewing every output (and might as well do it myself), or they’re loose enough that one bad run causes real damage. The middle is where the value is, and the middle is hard.

Some specific incidents that drove this:

A blog content agent renamed a published post slug, breaking inbound SEO links. It thought it was making the URL nicer.
A cluster-patrol agent killed a long-running pod it identified as “hung”. It was actually mid-restore on a Longhorn volume.
A trading hypothesis agent submitted ten variations of essentially the same hypothesis to the eval queue. Each one looked different at the prompt level; under the hood they were identical.

None of these were catastrophic. All of them were avoidable. The pattern was: I had granted the agent a permission it had no track record to justify.

The four rungs

  full autonomy ──── act, no notification unless something novel happens
        ▲
  trusted ─────── act, exceptions only — silent on routine work
        ▲
  monitored ───── act, but tell me what you did
        ▲
  supervised ──── propose an action, wait for me to approve

Promotion is one direction at a time, only after a demonstrable streak. Demotion can skip rungs and is the right call after any incident.

Supervised

Agent proposes; human approves; agent executes. Every action is reviewed before it lands. This is where every new agent starts — and where every agent goes back to after an incident.

Use cases that stay supervised forever:

Anything that spends money above a small threshold.
Anything that posts publicly under my pseudonym.
Anything that touches storage I don’t have current backups for.

Monitored

Agent acts; agent reports what it did. I review the reports asynchronously — could be that day, could be the next morning. If I notice something wrong I can roll back, but the agent didn’t pause to ask first.

This is the right rung for most “interesting” agents. Slack/Mattermost summaries, daily digests, batch jobs that produce inspectable artifacts.

Trusted

Agent acts; agent only reports exceptions. Routine, successful work is invisible. I find out about a deployment because there’s a Slack post saying “deployment failed at step 3” — not a play-by-play of every step that succeeded.

The rule for trusted agents: alerts should be actionable or silenced. There is no third category. A “FYI everything is fine” notification is just noise that trains me to ignore the channel, which means I’ll miss the real alert when it comes.

Full autonomy

Agent acts; agent reports nothing unless something genuinely novel happens. A weekly cert renewal that has succeeded 200 times in a row produces zero output. The first time it fails, that is the notification.

Things that earn full autonomy in my homelab:

Cert-manager renewals.
Daily backups (success path silent; failure pages me).
Routine dependency bumps in projects with full test coverage.
Image proxy-cache garbage collection.

Notably, no agent that produces user-facing content has full autonomy. Pseudonym integrity matters too much.

How agents move up the ladder

Promotion criteria, in roughly the order I check them:

Volume. At least 30 successful runs at the current rung. Fewer and I don’t have enough data.
Failure modes are understood. I’ve seen this agent fail at least once and the failure was either caught by guardrails or recoverable. I know what bad looks like.
The blast radius is bounded. If this agent goes haywire at the next rung, what’s the worst plausible outcome? If I can’t answer in one sentence, no promotion.
The notifications are still actionable. If the current rung’s notification stream has been “noise I ignore” for the last month, the next rung is appropriate. If I’m reading every report carefully, stay where I am.

A worked example: the OpenClaw morning briefing agent.

Started supervised. I reviewed every brief before it went to Mattermost.
Promoted to monitored after ~2 weeks. It posted; I read it within a few hours, sometimes flagged a hallucinated stat.
Promoted to trusted after ~3 months. It posts daily; I read it as I would a newsletter. If something is wrong, I notice and ping it back to monitored for a few days.
Currently sitting at trusted. It hasn’t earned full autonomy because the consequence of a wrong fact in a daily brief is “I act on bad info” — bounded but not nothing.

How agents move down the ladder

Demotion is a small ceremony. It happens after any of:

A real-world incident traceable to the agent’s action.
A change in the agent’s prompt, model, or tooling that I think materially shifts behavior.
A long quiet period (>30 days) where the agent ran but didn’t do anything substantive — the track record went stale.

Demotion goes back at least one rung, sometimes more. The agent doesn’t get an exception lane to recover quickly; it earns the rung back the same way as the first time.

Two things this framework changed

It removed me from the loop on routine work. Before the ladder, I was reviewing cert renewal notifications, backup confirmations, dependency bumps — output that was identical every day and never required action. Once I could justify “full autonomy” as a real rung that things could climb to, I let them climb, and my notification volume dropped by maybe 80%.

It gave me permission to demote without guilt. Before, demoting an agent felt like an admission that I’d promoted prematurely. Now demotion is a tool — used routinely after model changes or prompt rewrites — not a verdict.

Lessons

Default to supervised, not monitored. The cost of a few weeks of supervision is low. The cost of an unsupervised mistake is high.
Reproducibility caps the ladder. An agent that touches state reproducible from code can earn higher rungs faster than one that touches stateful data. The “30 minutes to rebuild” test applies to agents too.
Don’t grant org-wide autonomy because one agent earned it. Each agent’s track record is its own. A trusted summarizer agent and a brand-new deployment agent are not on the same rung just because they share a model.

What’s next

The framework is currently a written policy I apply by hand. The natural next step is to encode each agent’s current rung as a config flag the agent’s runtime checks at startup — so an agent that’s been demoted is incapable of acting at its old rung, not just supposed to be more careful. That’s a small refactor for a meaningful safety property.

TL;DR#

The problem#

The four rungs#

Supervised#

Monitored#

Trusted#

Full autonomy#

How agents move up the ladder#

How agents move down the ladder#

Two things this framework changed#

Lessons#

What’s next#