Context Window Management
Long debates hit a wall: model token limits. A 10-round debate with 5 experts can easily exceed 100,000 tokens of prior context. Council offers three levers to keep prompts within budget while preserving the most important context.
The Problem: Context Explosion
Section titled “The Problem: Context Explosion”Each expert’s prompt includes:
- System prompt (identity, expertise, rules) — ~2,000 tokens
- Topic — ~50–200 tokens
- Prior turns from earlier rounds — unbounded
- Current round turns so far — unbounded
A 5-expert, 10-round debate where each turn is 500 words = ~125,000 tokens of prior context alone. Most models cap at 128k–200k total (input + output).
Without context management, debates either:
- Fail with a token limit error
- Cost $10+ per debate in input tokens
- Run on models with huge context windows (expensive, slower)
Lever 1: Visibility Scoping
Section titled “Lever 1: Visibility Scoping”Control which prior turns each expert sees.
all (default, no scoping)
Section titled “all (default, no scoping)”Every expert sees every prior turn from every round.
Cost: highest
Quality: best for cross-expert reasoning
Use when: short debates (<5 rounds), model supports large context
same-round
Section titled “same-round”Each expert only sees turns from their current round. Prior rounds are hidden.
Cost: lowest (~80% reduction vs. all)
Quality: experts miss earlier arguments but reason from fresh prompts each round
Use when: token budget is tight, or you want experts to avoid anchoring on round 1
recent
Section titled “recent”Each expert sees the most recent N turns (configurable, default N=10).
Cost: medium
Quality: balances recency with cross-expert context
Use when: debates longer than 5 rounds, you want some history but not everything
council convene "Long topic" --context-scope recent --max-rounds 10council convene "Tight budget" --context-scope same-roundLever 2: Rolling Summaries
Section titled “Lever 2: Rolling Summaries”After round N, replace old verbatim turns with an LLM-generated summary.
LLM-based summarization (default)
Section titled “LLM-based summarization (default)”Council sends the full transcript to an LLM with this prompt:
“Summarize the key claims, disagreements, and convergences so far. Be concise but preserve the crux of each expert’s position.”
The summary (~500–1000 tokens) replaces 10,000+ tokens of verbatim turns.
Cost: ~$0.01–0.03 per summary
Quality: high — captures nuance, references experts by name
Use when: debates longer than 3–5 rounds
council convene "Long debate" --summarize-after 3After round 3, experts see:
- Summary of rounds 1–3 (~500 tokens)
- Verbatim turns from rounds 4+ (~2,000 tokens per round)
Heuristic summarization (--heuristic-summaries)
Section titled “Heuristic summarization (--heuristic-summaries)”Council uses a simple rule: show the first and last turn from each expert in the summarized range.
Cost: zero (no LLM call)
Quality: lower — misses mid-debate shifts, no cross-expert synthesis
Use when: token budget is critical and summary quality is secondary
council convene "Budget debate" --summarize-after 3 --heuristic-summariesLever 3: Hard Cap (maxPromptChars)
Section titled “Lever 3: Hard Cap (maxPromptChars)”When visibility scoping and summaries aren’t enough, apply a hard character limit on verbatim prior turns.
council convene "Mega-debate" --max-prompt-chars 50000Council:
- Applies visibility scoping and summaries first
- Measures the remaining verbatim turn content (chars)
- If over
maxPromptChars, truncates from the oldest turns first (newest-first eviction)
The topic, system prompt, and rolling summary are never truncated — only verbatim prior turns.
Use when: you’ve hit token limits even with scoping + summaries.
Combining Levers
Section titled “Combining Levers”All three levers are orthogonal — you can use any combination:
council convene "Complex debate" \ --context-scope recent \ --summarize-after 3 \ --max-prompt-chars 40000 \ --max-rounds 10This debate:
- Shows each expert only the 10 most recent turns (not all 50+ from 10 rounds)
- Replaces rounds 1–3 with an LLM summary after round 3
- If the prompt still exceeds 40k chars, truncates the oldest verbatim turns
Cost: ~$1–2 (vs. $10+ with no context management).
Trade-Offs
Section titled “Trade-Offs”| Lever | Pros | Cons |
|---|---|---|
| Visibility scoping | Fast, deterministic, no extra cost | Experts miss earlier context, less cross-expert reasoning |
| Rolling summaries (LLM) | High-quality compression, preserves key insights | Adds ~$0.01–0.03 per summary |
| Rolling summaries (heuristic) | Zero cost | Loses mid-debate nuance, no synthesis |
Hard cap (maxPromptChars) | Guarantees prompt fits | Arbitrary truncation can cut important context |
Default Behavior
Section titled “Default Behavior”With no flags, Council uses:
- Visibility:
all(every expert sees everything) - Summaries: none (no summarization)
- Hard cap: none (no truncation)
This works for short debates (1–3 rounds, 3–4 experts) but hits limits beyond that.
Monitoring Context Size
Section titled “Monitoring Context Size”Council emits a cost.update event after each round showing:
- Total tokens in (input)
- Total tokens out (output)
- Estimated cost ($USD)
The CLI also shows an estimated premium request count while a debate runs. A premium request is one AI expert turn, so the estimate is roughly experts × rounds in freeform debates or experts × phases in structured debates. It is not a hard limit or stop condition; actual usage can differ when retries, early exits, or future debate mechanics change the number of AI turns.
To reduce premium-request usage, lower the panel size or round count:
council convene "My topic" --max-rounds 2 # fewer rounds → fewer turnscouncil convene "My topic" --max-experts 2 # smaller panel → fewer concurrent turns per roundcouncil convene "My topic" --panel budget # purpose-built panel with fewer expertsWatch for input token counts approaching your model’s limit (e.g., 128k for Claude Sonnet, 200k for GPT-4 Turbo). If you see >80k input tokens, consider enabling context management.
Relation to Other Concepts
Section titled “Relation to Other Concepts”- Deliberation Model — how debate modes interact with context limits
- Memory Model — how persistent memory across debates contributes to context
- Moderation Strategies — how moderators inject rolling summaries into prompts
- Document RAG — how retrieved document snippets count against context budgets