Anti-Sycophancy Design

The core risk in multi-agent systems is sycophancy — models that agree with each other performatively instead of reasoning independently. Council combats this with two complementary mechanisms: proactive prompt-level enforcement (always active) and a configurable post-generation quality gate.

The Problem

Large language models are trained on human conversations where politeness and agreement are common. When you put multiple LLM instances in a panel, they default to:

“I agree with [expert]…”
“Great point! I’d add…”
“That’s a solid analysis…”

This defeats the purpose of a panel. You pay for multiple expert turns and get one perspective echoed back with minor variations.

Mechanism 1: Prompt-Level Enforcement (Always On)

Every expert’s system prompt contains the DEBATE PROTOCOL, which proactively discourages sycophancy before any response is generated:

DEBATE PROTOCOL:
Your goal is to find weaknesses in other experts' reasoning.
Performative agreement ("great point") is forbidden.
If you cannot find a material weakness, say explicitly:
"I've stress-tested [expert]'s argument and cannot find a material weakness."

The system prompt also lists the forbidden phrases directly as explicit prohibitions. This enforcement is unconditional — it runs for every expert in every panel debate.

Every expert receives this instruction in section [4] of their system prompt (see Architecture Overview for the 8-section structure).

Mechanism 2: Post-Generation Quality Gate (Configurable)

After an expert responds, the quality gate (quality-gate.ts) checks the assembled response against three heuristic layers. What happens when a response fails depends on qualityGate.mode:

Mode	Default?	Behavior
`off`		The gate does nothing.
`warn`	✓	The response is flagged with a visible one-line notice but still lands in the transcript. Nothing is regenerated or removed.
`regenerate`		A failing response triggers a re-prompt with a corrective hint, up to `qualityGate.maxRegenerations` (default 1) extra attempts. If it still fails after the cap, the last candidate is kept.

The gate runs only in panel debates (convene/review). Single-expert council ask calls are not gated.

Layer 1: Forbidden Phrases

Checks whether the response contains phrases like:

“I agree with”
“great point”
“solid analysis”
“well said”
“just echoing”
“echoing your”
“echoing the”
“building on that”

These phrases are never substantive — they’re social glue, not reasoning. A response containing them fails Layer 1.

Layer 2: Disagreement Budget

When prior speakers have already spoken in the current round, the expert must include at least one disagreement signal:

“I disagree with…”
“Weak claim…”
“Scenario where this fails…”
“Omitted consideration…”
“Counter-argument…”

Or the explicit stand-down marker: “I’ve stress-tested this and cannot find a material weakness.”

Layer 2 is only evaluated when there are prior speakers in the round — the first expert to speak has no one to disagree with yet.

Layer 3: Specificity Check

Responses under 12 words fail as too short to be substantive. (12 words ≈ two short sentences — the minimum to encode a position.)

What Happens on Failure

off — gate is disabled; no action.
warn (default) — a turn.quality_gate notice appears in the debate output; the response lands in the transcript unchanged.
regenerate — the expert is re-prompted with a hint describing the specific failure. A rejected candidate does not land in the transcript if a passing regeneration is produced; if still failing after the cap, the last candidate is kept.

The Stand-Down Rule

Disagreement for its own sake is also worthless. When an expert genuinely cannot find a weakness, they can say:

“I’ve stress-tested [expert]‘s argument and cannot find a material weakness.”

This is explicit intellectual honesty — the expert tried, evaluated, and is signaling confidence in the claim. It’s the only acceptable form of agreement in a panel.

Why Heuristics Instead of LLM Judges?

Council’s quality gate is purely heuristic (substring matching, word counts). Why not use another LLM to judge quality?

Latency: heuristic checks run in <1ms; LLM judges add 1-3 seconds per response
Cost: every regeneration would double token spend
Reliability: LLM judges can be gamed or confused by meta-level reasoning (“I agree, but only to set up a counter…”)

An LLM-based judge layer could be added later, but heuristics are the cheap, deterministic first line of defense.

Configuring the Quality Gate

# Default: flag failures, keep the response in the transcript
council config set qualityGate.mode warn

# Disable the gate entirely (prompt-level enforcement still runs)
council config set qualityGate.mode off

# Re-prompt failing responses (up to N extra attempts)
council config set qualityGate.mode regenerate
council config set qualityGate.maxRegenerations 2

In warn mode you will see a one-line notice in the debate output when the gate fires — for example:

⚠ quality gate: aria response flagged (no_disagreement_signal)

The response still appears in the debate. The notice is informational.

Trade-offs

False positives: occasionally, a substantive response that happens to lack a disagreement signal fails the gate. In warn mode this surfaces as a notice but doesn’t affect the transcript. In regenerate mode the expert is re-prompted and may produce a stronger response.

False negatives: sophisticated models can fake disagreement with vague objections. The specificity check (layer 3) mitigates this but doesn’t eliminate it.

Gate scope: the post-generation gate runs only in panel debates. For single-expert council ask, only the prompt-level enforcement applies — there are no peers to disagree with.

Relation to Other Concepts

Deliberation Model — why disagreement is the core mechanic
Persona Experts — how document-grounded experts maintain distinct perspectives
Moderation Strategies — how devil’s-advocate mode amplifies disagreement