STATEK Harness Policies

STATEK's LLM harness keeps agent jobs bounded.

It is a process-level guardrail applied to each job as the job runs. It checks turn count, approximate token usage, total exceptions, and consecutive exceptions so an agent that loops, retries blindly, or keeps producing failing code can be stopped and inspected.

⚠️

Harness policies are not a security boundary. They do not replace sandboxing, authorization, provider quotas, cost alerts, secrets discipline, CPU or memory limits, or application-level side-effect controls.

Configure the Harness

The default harness settings are environment variables:

STATEK_MAX_TURNS=5
STATEK_MAX_EXCEPTIONS=3
STATEK_MAX_CONSECUTIVE_EXCEPTIONS=1
STATEK_MAX_TOKEN_USAGE=10000
STATEK_LIMIT_EXTENSION_PER_COMPLETION=0

These are loaded into StatekSettings and used by the global LLM_Harness instance returned by get_llm_harness().

What the Limits Mean

STATEK_MAX_TURNS limits LLM turns for a job. Warmup blocks do not count as LLM turns.

STATEK_MAX_TOKEN_USAGE limits the job's approximate token usage. STATEK currently estimates this from total bytes sent and received:

approx_token_usage = (total_bytes_sent + total_bytes_received) // 4

Use this as a runaway-job guardrail, not as exact billing. Provider-reported input tokens, output tokens, cached tokens, and cost are tracked separately on job.usage where the provider response includes them. See STATEK Metering for the full distinction.

STATEK_MAX_EXCEPTIONS limits the total number of exceptions from LLM turns. It counts both Python execution exceptions and tool-call exceptions. Warmup exceptions are handled through the warmup failure path and are not counted in this LLM-turn exception total.

STATEK_MAX_CONSECUTIVE_EXCEPTIONS limits the longest streak of failing LLM turns. Tool-call failures count in this streak. A clean LLM turn resets the streak.

Harness counters are per job. Two jobs using the same process-level harness do not share token usage, turns, or exception counts.

When Checks Run

STATEK checks turn and exception limits before a job step starts. It checks token usage and exception limits again after the step.

When a limit is exceeded, the harness raises LLM_HarnessError. In the normal worker path, STATEK records the error in the job console, sets an exit status, marks the job DONE, and runs critical-error handling. Registered job error handlers are invoked during that critical-error path; see Error Handling.

The result is deliberately inspectable: the job stops, but its Python state, chat log, console output, tool results, usage, and error message remain available for review.

Completion-Based Extension

Long-lived conversational jobs sometimes need a little more room after they have completed useful turns. STATEK can extend harness limits based on job.num_completions.

The effective limit is:

effective_limit = base_limit + (1.0 + extension) * num_completions

extension is STATEK_LIMIT_EXTENSION_PER_COMPLETION.

For example, with STATEK_MAX_TURNS=5, STATEK_LIMIT_EXTENSION_PER_COMPLETION=0.5, and num_completions=2, the effective turn limit is:

5 + (1.0 + 0.5) * 2  # 8.0

If completion count is not available, STATEK uses the base limit. If a limit is unlimited internally, completion extension does not make it limited.

Keep this setting conservative. Completion-based extension is useful for multi-message interactions, but a high extension value can hide prompt loops or tool design problems.

Practical Presets

For local development, give the agent enough room to expose real failures without burning through a large context:

STATEK_MAX_TURNS=8
STATEK_MAX_EXCEPTIONS=5
STATEK_MAX_CONSECUTIVE_EXCEPTIONS=2
STATEK_MAX_TOKEN_USAGE=30000
STATEK_LIMIT_EXTENSION_PER_COMPLETION=0

For an interactive production agent, keep consecutive failures tight and choose a token budget that matches the expected user interaction:

STATEK_MAX_TURNS=5
STATEK_MAX_EXCEPTIONS=3
STATEK_MAX_CONSECUTIVE_EXCEPTIONS=1
STATEK_MAX_TOKEN_USAGE=15000
STATEK_LIMIT_EXTENSION_PER_COMPLETION=0.25

For a background or batch agent, a larger budget can make sense, but exception policies should still stop bad loops quickly:

STATEK_MAX_TURNS=15
STATEK_MAX_EXCEPTIONS=4
STATEK_MAX_CONSECUTIVE_EXCEPTIONS=1
STATEK_MAX_TOKEN_USAGE=60000
STATEK_LIMIT_EXTENSION_PER_COMPLETION=0

Treat these as starting points. Real values depend on model behavior, prompt size, tool latency, provider limits, user expectations, and how expensive the job's side effects can be.

Designing Agents for Harness Limits

Give agents an explicit done path. A prompt that says what successful completion looks like is easier to bound than one that asks the model to keep improving until it decides to stop.

Keep tool outputs compact. Large raw payloads, stack traces, logs, or search results increase request size on later turns and can trip token limits even when the agent is behaving correctly.

Return recoverable tool errors as structured results when the agent can do something useful next. Reserve thrown exceptions for cases where the tool really failed. Repeated tool exceptions count against harness exception policies.

Use warmup code for deterministic setup, queue-picking, and context preparation. Do not rely on LLM-turn exception counters to catch warmup failures; warmup failures are handled as job definition or startup errors. If warmup claims external work, read the cleanup guidance in Error Handling.

When a job hits a harness limit, review the persisted history before raising the limit. Common causes are vague prompts, tools that return too much data, missing success criteria, repeated tool-call failures, or an agent trying to repair an error it cannot actually repair.

What the Harness Does Not Do

The harness does not make Python execution safe by itself. Use STATEK sandbox settings and deployment controls, and still design authorization around the application objects, tools, credentials, networks, files, and tenants the agent can reach.

The harness does not guarantee exact cost control. Provider quotas, budget alerts, model selection, pricing metadata, request logging, and operational monitoring remain separate concerns. See STATEK Metering for cost estimates and usage inspection.

The harness does not make external side effects reversible or exactly-once. A stopped job may already have called an API, sent a message, changed a file, or mutated an application object. Side-effecting tools still need idempotency, approval policy, auditability, and recovery rules.

Read Configuration for the full settings surface, Operations for worker loops and concurrency, and Security for runtime boundaries.

Configuration Model Providers