STATEK
Documents Machinery

STATEK Documents Machinery

STATEK documents are file-backed reference material that agents can list and inspect during a job.

They are useful for stable instructions, product rules, API notes, policy summaries, and other text that should be available on demand without dumping every document into every prompt.

⚠️

Documents become agent-visible context when loaded. Keep secrets, credentials, unrelated private records, and broad tenant data out of document files unless your application has the right authorization and runtime controls around the job.

Directory Layout

Configure the document root with STATEK_DOCUMENTS_DIR or StatekSettings.documents_dir:

STATEK_DOCUMENTS_DIR=./documents

STATEK loads .txt and .md files recursively:

documents/
  reports/
    filters.md
    export-rules.md
  support/
    escalation.txt

Documents are grouped by their topic metadata, not by folder name. Folder structure is only an organization choice.

Document File Format

Each document starts with metadata headers followed by body text:

# ord_no: 0
# topic: Reports
# title: Allowed filters
# audience: report_writer, dispatcher
 
Reports can be filtered by date range, owner, status, and tag.
Do not filter by raw internal IDs unless the request explicitly provides one.

Required headers:

  • ord_no: numeric order within the topic
  • topic: topic name used by listing and lookup
  • title: document title used by listing and lookup

Optional header:

  • audience: comma-separated agent roles allowed to see this document

If audience is missing, the document is visible to all agents. If it is present, only listed agent roles can list or show the document.

The body is stored as lines. Leading and trailing blank lines around the body are stripped.

Listing Topics

Inside an active STATEK job, agents use the wrapper tool:

list_of_documents()

Without a topic, STATEK lists topics that contain at least one document visible to the current agent:

# Topic ID: Topic name (2 total)
0: Reports
1: Support

If no document directory is configured, or no documents are loaded, STATEK prints:

# No documents found

Listing Documents In A Topic

Pass a topic ID, exact topic name, or name fragment:

list_of_documents(topic=0)
list_of_documents(topic="Reports")
list_of_documents(topic="report")

STATEK prints documents in that topic that match the current agent's audience:

# Document ID: Document name (2 total)
0: Allowed filters
1: Export rules

Pagination is available:

list_of_documents(topic="Reports", start_index=25, limit=25)

When a topic is found, STATEK stores its ID as last_topic_id in persistent context. show_document(...) can use that later.

If a topic fragment matches multiple accessible topics, lookup raises an ambiguity error listing the matching topics. If no topic matches, STATEK prints a not-found message.

Showing A Document

Use show_document(...) with a document ID, exact title, title fragment, or close title match:

show_document(0, topic="Reports")
show_document("Allowed filters", topic="Reports")
show_document("filters", topic="Reports")

Output starts with the title and line range:

# Allowed filters (lines 0-2/2)
Reports can be filtered by date range, owner, status, and tag.
Do not filter by raw internal IDs unless the request explicitly provides one.

By default, show_document(...) prints the first 50 lines. Use start_from and limit for longer documents:

show_document("Allowed filters", topic="Reports", start_from=50, limit=50)

If topic is omitted, STATEK uses last_topic_id from the previous successful topic lookup:

list_of_documents(topic="Reports")
show_document("Allowed filters")

If neither topic nor last_topic_id is available, the lower-level implementation raises a value error because the document key is ambiguous without a topic.

Lookup Behavior

Topic lookup supports:

  • topic ID
  • exact topic name, case-insensitive
  • topic name fragment, case-insensitive

Document lookup supports:

  • document ID
  • exact title, case-insensitive
  • title fragment, case-insensitive
  • fuzzy title match when similarity is at least 90 percent

Exact matches take priority over fragment matches. Fragment matches must resolve to one document; otherwise STATEK raises an ambiguity error listing the matches.

Audience filtering applies before lookup. An inaccessible document is treated as if it is not present for that agent.

Warmup Integration

Document tools are often used during warmup when a job should start with relevant reference material already in history:

list_of_documents(topic="Reports") #STATEK: as tool
show_document("Allowed filters", topic="Reports") #STATEK: as tool

Use this sparingly. Loading a few precise documents is usually better than printing a full library into the first model turn.

See Warmup Code for #STATEK: as tool behavior.

Lower-Level Helpers

STATEK exposes lower-level document helpers for tests and application tooling:

from statek.document import load_documents, parse_document
 
topics = load_documents("./documents")

Useful helpers:

  • parse_document(...): parse one document string into a Document
  • load_documents(...): load .txt and .md files recursively and group them into topics
  • find_topic(...): find one accessible topic by ID, exact name, or fragment
  • find_document(...): find one accessible document by ID, exact title, fragment, or fuzzy title match

Most agent workflows should use list_of_documents(...) and show_document(...).

Practical Rules

Use documents for reference material that should be inspectable on demand:

  • tool-specific rules
  • product constraints
  • report templates
  • escalation policies
  • short API notes
  • stable team or workflow preferences

Do not use documents as a permission boundary. The audience header controls which agents can list or show a document through these tools, but application authorization, secrets handling, and runtime isolation still belong in application code and deployment configuration.

Where to Go Next

Read Examples Machinery for code-and-output transcripts, Warmup Code for loading selected documents before model execution, Prompt Definitions for prompt metadata, and Security for context and credential boundaries.