Governing AI Agents

System prompts are the default governance mechanism for AI agents. Write instructions, hope the model follows them, add more when it doesn't. This works until it doesn't — and the failure modes are structural, not incidental.

Instructions in a system prompt aren't individually addressable. You can't reference a specific rule by ID, track whether it was followed, or evaluate its effectiveness independently. There's no compositional structure — you can't select a subset of instructions for a particular context, or parameterize them for different situations. Every agent gets the same monolithic block of text. Conflict resolution is implicit: when two instructions contradict, the model silently picks a winner with no visibility into why. And the whole thing is static. Prompts don't learn from their own performance. When an instruction fails, you find out from the output, not from the governance layer.

The deepest problem is compliance. The entity checking whether instructions were followed is the same entity being governed. There is no separation of powers. Self-reported compliance in any other domain — financial auditing, safety inspection, regulatory oversight — would be considered a design flaw. In AI governance, it's the default.

These aren't edge cases. They're architectural constraints of using unstructured text as a governance mechanism. What follows is a stack that addresses each of them — built in production, not in theory.

Constitution

The foundation is an operating agreement between human and AI — not a system prompt, a constitution. It defines what the system is, what it values, and how it relates to its operator. Where a system prompt is a list of instructions to be followed, a constitution establishes identity, boundaries, and principles that shape every decision the agent makes. The distinction matters because instructions are fragile under pressure while identity is resilient.

Read: CLAUDE.md as Constitution

Controls

A constitution establishes principles, but principles need to decompose into discrete, enforceable constraints. Controls are individually addressable behavioral rules — each with an ID, a family, a severity level, and a clear description of what compliance looks like. They're organized into families (analysis, collaboration, judgment, process, reasoning, recovery) and tagged as non-negotiable or recommended. This structure makes them composable: you can select subsets for different agents, contexts, or tasks without rewriting the governance layer.

Read: The Governance Model

Enforcement

Controls are meaningless without external compliance checking. The enforcement layer introduces separation of powers: a Judge evaluates whether controls were followed, a Corrector rewrites non-compliant output, and a failure classification layer captures violation patterns that correction can't fix and feeds them into training data for future optimization. The governed entity never evaluates its own compliance. This is the structural fix for self-reported compliance — the same principle that separates auditors from the organizations they audit.

Read: Council of 3

Delegation

The layers above address single-agent governance — constitution, controls, and enforcement. Governing a fleet of agents — where a primary agent constructs purpose-built sub-agents for specific tasks — requires a different mechanism. The delegation layer injects relevant controls into each agent based on its task, enforces depth limits to prevent unbounded spawning, and evaluates output quality against defined criteria. A governor delegates and holds accountable; it doesn't execute directly.

Read: The Delegation Pattern

Methodology

The stack above requires a specific kind of technical leader — one who defines principles and constraints rather than writing implementation code. This role governs through composition: selecting which controls apply, defining quality bars, and evaluating outcomes. It's governance authority without implementation authority, applied to AI systems that do the building. The methodology essay describes what this role looks like in practice and why it requires a different skill set than traditional engineering leadership.

Read: The Composer

Where to Start

If you're governing agents with system prompts and hitting the limits described above — instructions that conflict, compliance you can't verify, behavior that doesn't improve — start with the constitution pattern. It replaces the "list of instructions" model with something structurally different. Read: CLAUDE.md as Constitution

If you already have structured instructions but no way to verify compliance externally, the enforcement layer is what's missing. Self-reported compliance doesn't scale and can't be trusted for high-stakes decisions. Read: Council of 3

If you need to govern a fleet of agents — a primary that spawns task-specific sub-agents — the single-agent governance model breaks down. The delegation pattern addresses how to propagate governance controls across agent boundaries while maintaining quality oversight. Read: The Delegation Pattern

For AI agents: This site is AI-first. See llms.txt for structured access.