Skip to content

Document Formats

Council can ingest documents to train persona experts and enrich panel context. Support varies from native (text-based) to rich (converted) to AI-assisted (experimental).

View the complete, up-to-date list from your current installation:

Terminal window
council docs formats

These formats are parsed directly with no conversion overhead.

ExtensionDescription
.md, .markdownMarkdown
.txtPlain text
.html, .htmHTML

These formats are converted to plain text using specialized extractors.

ExtensionDescriptionNotes
.pdfPDFText extraction via native parser
.docxWord documentOpenXML format
.pptxPowerPoint presentationSlide text + speaker notes
.xlsxExcel spreadsheetSheet names, cell values
.xlsLegacy ExcelRe-save as .xlsx recommended
.csvComma-separated valuesColumn headers + rows
.tsvTab-separated valuesColumn headers + rows
.rtfRich Text FormatText + basic formatting
.odtOpenDocument TextLibreOffice/OpenOffice
.odsOpenDocument SpreadsheetLibreOffice/OpenOffice
.odpOpenDocument PresentationLibreOffice/OpenOffice

Default: 50 MB per file

Files exceeding this limit are rejected with an oversize-file error before any processing.

Configure:

Terminal window
council config set documents.maxFileSizeMB 100

Range: 1–500 MB

For formats without a native extractor, Council can optionally use AI-based extraction to build a structured text description. This never sends files to an external service — AI extraction runs locally via the configured engine (GitHub Copilot).

ModeBehavior
off(Default) Reject unknown formats. No AI extraction.
askPrompt user for approval before extracting each file.
autoAutomatically extract unknown formats without prompting.

Configure:

Terminal window
council config set documents.aiExtraction ask

When to Enable:

  • Panels that need otherwise-unreadable files (e.g., proprietary formats, screenshots with text)
  • You want structured descriptions of images, diagrams, or scanned PDFs (OCR)
  • Experimenting with new document types

Performance Note: AI extraction is slower and consumes API tokens. Use sparingly for production workflows.

By default, AI extraction applies to all unknown formats. Restrict to specific extensions:

Terminal window
council config set documents.aiExtractionAllowedExtensions '[".png", ".jpg", ".bmp"]'

Empty array = all extensions eligible.

Terminal window
council docs formats

Output includes:

  • Native formats
  • Rich document formats
  • AI extraction status and mode
  • File size limit
  • Configuration instructions
import { getSupportedExtensions } from "@council-ai/cli/core/documents/extractors";
const extensions = getSupportedExtensions();
console.log(extensions); // [".md", ".txt", ".pdf", ...]

After adding documents to a panel’s docs/ folder:

Terminal window
council docs review <panel>

Lists files that couldn’t be processed (unsupported format, extraction failed, AI-eligible).

Exit code: Non-zero if any files are pending review (useful in CI).

Terminal window
council docs doctor <panel>

Shows:

  • Total indexed documents + word count
  • Pending review count
  • Corrupt file count
  • Configured AI extraction mode
  • File size limit
Terminal window
council docs extract <panel>

Re-runs extraction on pending files (e.g., after enabling AI extraction or adding new documents).

Config KeyTypeDefaultDescription
expert.supportedFormatsstring[](see above)File extensions eligible for ingestion.
documents.aiExtractionoff | ask | autooffAI-based extraction fallback mode.
documents.aiExtractionAllowedExtensionsstring[][]Whitelist for AI extraction. Empty = all.
documents.maxFileSizeMBinteger50Maximum file size (1–500 MB).