Skip to content

Context Window Management

Long debates hit a wall: model token limits. A 10-round debate with 5 experts can easily exceed 100,000 tokens of prior context. Council offers three levers to keep prompts within budget while preserving the most important context.

Each expert’s prompt includes:

  • System prompt (identity, expertise, rules) — ~2,000 tokens
  • Topic — ~50–200 tokens
  • Prior turns from earlier rounds — unbounded
  • Current round turns so far — unbounded

A 5-expert, 10-round debate where each turn is 500 words = ~125,000 tokens of prior context alone. Most models cap at 128k–200k total (input + output).

Without context management, debates either:

  1. Fail with a token limit error
  2. Cost $10+ per debate in input tokens
  3. Run on models with huge context windows (expensive, slower)

Control which prior turns each expert sees.

Every expert sees every prior turn from every round.

Cost: highest
Quality: best for cross-expert reasoning
Use when: short debates (<5 rounds), model supports large context

Each expert only sees turns from their current round. Prior rounds are hidden.

Cost: lowest (~80% reduction vs. all)
Quality: experts miss earlier arguments but reason from fresh prompts each round
Use when: token budget is tight, or you want experts to avoid anchoring on round 1

Each expert sees the most recent N turns (configurable, default N=10).

Cost: medium
Quality: balances recency with cross-expert context
Use when: debates longer than 5 rounds, you want some history but not everything

Terminal window
council convene "Long topic" --context-scope recent --max-rounds 10
council convene "Tight budget" --context-scope same-round

After round N, replace old verbatim turns with an LLM-generated summary.

Council sends the full transcript to an LLM with this prompt:

“Summarize the key claims, disagreements, and convergences so far. Be concise but preserve the crux of each expert’s position.”

The summary (~500–1000 tokens) replaces 10,000+ tokens of verbatim turns.

Cost: ~$0.01–0.03 per summary
Quality: high — captures nuance, references experts by name
Use when: debates longer than 3–5 rounds

Terminal window
council convene "Long debate" --summarize-after 3

After round 3, experts see:

  • Summary of rounds 1–3 (~500 tokens)
  • Verbatim turns from rounds 4+ (~2,000 tokens per round)

Heuristic summarization (--heuristic-summaries)

Section titled “Heuristic summarization (--heuristic-summaries)”

Council uses a simple rule: show the first and last turn from each expert in the summarized range.

Cost: zero (no LLM call)
Quality: lower — misses mid-debate shifts, no cross-expert synthesis
Use when: token budget is critical and summary quality is secondary

Terminal window
council convene "Budget debate" --summarize-after 3 --heuristic-summaries

When visibility scoping and summaries aren’t enough, apply a hard character limit on verbatim prior turns.

Terminal window
council convene "Mega-debate" --max-prompt-chars 50000

Council:

  1. Applies visibility scoping and summaries first
  2. Measures the remaining verbatim turn content (chars)
  3. If over maxPromptChars, truncates from the oldest turns first (newest-first eviction)

The topic, system prompt, and rolling summary are never truncated — only verbatim prior turns.

Use when: you’ve hit token limits even with scoping + summaries.

All three levers are orthogonal — you can use any combination:

Terminal window
council convene "Complex debate" \
--context-scope recent \
--summarize-after 3 \
--max-prompt-chars 40000 \
--max-rounds 10

This debate:

  1. Shows each expert only the 10 most recent turns (not all 50+ from 10 rounds)
  2. Replaces rounds 1–3 with an LLM summary after round 3
  3. If the prompt still exceeds 40k chars, truncates the oldest verbatim turns

Cost: ~$1–2 (vs. $10+ with no context management).

LeverProsCons
Visibility scopingFast, deterministic, no extra costExperts miss earlier context, less cross-expert reasoning
Rolling summaries (LLM)High-quality compression, preserves key insightsAdds ~$0.01–0.03 per summary
Rolling summaries (heuristic)Zero costLoses mid-debate nuance, no synthesis
Hard cap (maxPromptChars)Guarantees prompt fitsArbitrary truncation can cut important context

With no flags, Council uses:

  • Visibility: all (every expert sees everything)
  • Summaries: none (no summarization)
  • Hard cap: none (no truncation)

This works for short debates (1–3 rounds, 3–4 experts) but hits limits beyond that.

Council emits a cost.update event after each round showing:

  • Total tokens in (input)
  • Total tokens out (output)
  • Estimated cost ($USD)

The CLI also shows an estimated premium request count while a debate runs. A premium request is one AI expert turn, so the estimate is roughly experts × rounds in freeform debates or experts × phases in structured debates. It is not a hard limit or stop condition; actual usage can differ when retries, early exits, or future debate mechanics change the number of AI turns.

To reduce premium-request usage, lower the panel size or round count:

Terminal window
council convene "My topic" --max-rounds 2 # fewer rounds → fewer turns
council convene "My topic" --max-experts 2 # smaller panel → fewer concurrent turns per round
council convene "My topic" --panel budget # purpose-built panel with fewer experts

Watch for input token counts approaching your model’s limit (e.g., 128k for Claude Sonnet, 200k for GPT-4 Turbo). If you see >80k input tokens, consider enabling context management.