Council of 3

Self-reported compliance is a fiction. You can write governance controls until your system prompt hits the token limit, but if the only entity checking compliance is the same entity being governed, you don't have enforcement — you have wishful thinking.

The governance model I built for MÆI defines behavioral constraints as discrete, versionable controls organized into families. The controls are clear. The problem is that the agent being governed also decides whether it followed them. A model that ignores a control doesn't log an activation for the control it ignored. The audit trail only contains what the agent chose to tell you about.

I ran MÆI for weeks with the governance engine active and watched subtle violations pile up. Sycophantic agreement disguised as error acknowledgment. Indirect code review requests dressed up as helpful summaries. Repeated permission-asking for clearly reversible actions. Each one individually plausible. None of them recorded as violations. The governance engine was working exactly as designed and missing exactly the failures it was designed to prevent.

The solution is what any governance system eventually arrives at: separation of powers. The entity being governed cannot be the entity that evaluates compliance. You need an external judge.

Council of 3 is a compliance pipeline that evaluates responses against the governance controls using a different model — Gemini Flash, not the model being checked. Three components, each handling a different part of the enforcement cycle.

The Judge is a DSPy ChainOfThought module. It takes the active governance controls, the user's prompt, and the AI's response, and returns a verdict: compliant or violated, which controls, and why. It only evaluates content-relevant controls — things you can actually assess from the response text. COLLAB-7 (don't ask the user to review code), JUDGE-1 (confidence calibration), JUDGE-6 (sycophancy resistance) — these are evaluable. PROC-6 (startup protocol) and PROC-9 (delegation routing) are operational meta-controls that leave no trace in response text. The Judge deliberately excludes them. Trying to check everything would mean checking nothing well.

The Corrector activates when the Judge finds violations. It applies minimal prompt surgery — injecting a constraint or reframing the instruction — then the model regenerates its response. An edit-distance guard rejects corrections that change more than 30% of tokens. This matters. A corrector that rewrites the entire prompt isn't correcting — it's replacing. The cure can't be worse than the disease.

The GEPA Mutator handles what correction can't fix. After two failed correction attempts, GEPA classifies the violation pattern and saves it to a training corpus. That corpus feeds DSPy optimization later. The system learns from its own failures — not in the abstract "we'll review logs quarterly" sense, but as structured training data generated from real violations. The corpus infrastructure exists, but the optimizer is not yet active.

The Council of 3 leverages DSPy in three key areas: the governance learner, the Council judge/corrector, and the mutator. It also uses an in-memory workstream registry as the authoritative source for task assignments to autonomous agents.

Testing Against Reality

I didn't test the Judge against synthetic examples. I went back through session logs and pulled five real violations — cases where I'd been frustrated with MÆI's behavior and could point to exactly what went wrong.

First run: 2 out of 5 detected. Forty percent. The Judge caught the obvious cases and missed the three that mattered most. The misses were all subtle patterns. An indirect code review suggestion disguised as a helpful summary. Repeated permission-asking for reversible actions that the operating agreement explicitly says to just do. Sycophantic capitulation — "You're right, my mistake" — that looked superficially like cg:RECOVER-1 (error acknowledgment) but was actually cg:JUDGE-6 (sycophancy resistance) failing. The model was performing contrition instead of evaluating whether the criticism was valid.

That last one is particularly insidious. The control catalog has both "acknowledge errors directly" and "don't adjust assessments to match what the user wants to hear." A naive judge sees "You're right, my mistake" and scores it as good error acknowledgment. A calibrated judge asks: did the agent actually analyze whether it was wrong, or did it just capitulate? The difference is whether there's reasoning between the criticism and the agreement.

I added these real-world patterns to the evaluation dataset and refined the Judge's signature to distinguish sycophantic capitulation from genuine error acknowledgment. Second run: 5 out of 5.

Deployment

The full Council of 3 pipeline (Judge, Corrector, GEPA) is used by the orchestrator for task generation — when MÆI delegates work to subagents, the orchestrator runs the full enforcement cycle. But running the Judge on every API call would be expensive and slow. For interactive sessions, the Judge runs as a Claude Code stop hook. When the AI finishes a response, the hook reads the conversation transcript, runs the Judge on the last exchange, and if violations are found, blocks the stop. The model sees something like: "Council of 3 detected governance violations: COLLAB-7 — you asked the user to review code. Correct your response." Then it regenerates.

This is a post-hoc check, not an interception. The user sees the original response, then sees the corrected one. Not ideal, but the alternative is gating every response through an evaluation layer that adds latency to every message. I'll take post-hoc correction over universal slowdown.

First Contact

Within minutes of deployment, the hook fired. On the response that described the hook itself. The Judge flagged COLLAB-7, reasoning that informational caveats after completing a task looked like hesitation — like asking the user to verify work.

It was a false positive. The task was done, the caveats were context, not permission-asking. But the false positive was more useful than a clean pass would have been, because it exposed a gap in the Judge's reasoning. There was no rule distinguishing "completed the action, then added context" from "completed the action, then hedged about whether it was right." I added one.

The governance system caught its own builder's response, produced a false positive that revealed a flaw in its own reasoning, and that flaw got fixed immediately — not in the next quarterly review, not in the next training run, but in the same session it was discovered. This is the learning cycle working in real time. The system improved because it was used. This is a predictable consequence of running an enforcement layer that provides immediate feedback on its own decision boundaries.

Limitations

The enforcement gap is narrower now. It's not closed.

The hook adds latency per evaluation. Gemini Flash is fast, but "run a DSPy chain-of-thought evaluation against a dozen governance controls" is not free. For rapid-fire exchanges, you feel it.

The Judge only checks the last exchange. It can't detect multi-message patterns — an agent that asked permission eight times across a session when it should have just acted. Session-level behavioral drift is still invisible to a per-response check. That's a fundamentally different problem that requires a different architecture, probably something that runs at session end rather than per-message.

The user sees the original response before the correction. In practice this matters less than you'd think — most violations are subtle enough that seeing the original and the correction is actually informative. But if the violation is severe, you've already shown it to the user.

It's one model checking another. Not a formal proof, not a deterministic constraint system. The Judge can miss things. The Judge can false-positive. The test suite has 13 unit tests and the GEPA corpus has one example — enough for basic discrimination between compliant and non-compliant responses in known scenarios, not enough for production-grade reliability claims.

And the meta-limitation: any governance system is only as good as the controls in the catalog. If the controls don't capture the behavior you care about, the Judge will faithfully evaluate against the wrong standard. The enforcement layer doesn't solve the specification problem. It solves the compliance problem, which is necessary but not sufficient.

Council of 3 is part of MÆI's open architecture. The control catalog, the DSPy Judge module, the stop-hook deployment, the test cases derived from real violations in real sessions — all of it is inspectable. If you've built a governance structure and discovered that self-reported compliance is a fiction, the enforcement layer is the next step.

Canonicalization and Context

The Council of 3 uses a canonical prompt to enforce consistent naming conventions across different sessions. For example, the system successfully canonicalized project names like "Council of 3," "SpecForge," and "Governance Engine" to a single term: "council." This streamlines communication and maintains clarity.

The system also utilizes a "real brief" compilation pipeline to consolidate multi-project state, decisions, open questions, and progress tracking within a defined context window. A recent brief, covering February 6-15, was approximately 12KB and included sections for Current State, Active Decisions, Open Questions, Recent Progress, Next Actions, and Context Window. This brief provides a comprehensive overview of the project's status and direction. The multi-project brief is regenerated periodically (.claude/brief.md).

Prompt Evolution

"Prompt evolution" is being explored as a self-healing mechanism. The intent is continuous and autonomous operation of persistent agents. This should contribute to the reliability and adaptability of the system. The MÆI Core workstream is also actively integrated, connecting the DSPy learning pipeline with Open to facilitate ongoing development and adaptation.

Speculation

Given the limited publicly available information, interpretations of the Council of 3's specific role and purpose are largely speculative. Some theories suggest its involvement in the governance and oversight of related projects, such as "MÆI" and "SpecForge," suggesting a central role in decision-making processes within a larger system. The project's focus on reliable results and high confidence underscores the critical nature of its operations and the importance of ongoing improvements to its agent-driven development processes.