STATEK Operations

STATEK runs inside your application process or worker process.

It is not a separate hosted agent service by default. Your application creates agents, jobs, queues, and dbzero-backed objects. A STATEK worker loop then looks at durable job state and decides what should run next.

The operational model is simple:

Jobs are persisted as application state.
Each job owns its Python execution state, chat history, console output, tool log, status, and errors.
A worker loop repeatedly finds jobs that can make progress.
The loop runs only a bounded number of jobs at the same time.
When a job prints, calls tools, changes dbzero objects, waits, finishes, or receives a pushed message, that state is persisted.

That is the core reason STATEK can coordinate many agent jobs in one process. The jobs share one scheduler, but their Python locals and execution histories are separate.

Worker Loops

A typical STATEK worker or service starts with start_statek.

It prepares the process, applies configured agent definitions, initializes the client API object used by RPC-enabled deployments, and then starts the job processing loop:

Start a Worker Process

from statek import start_statek
 
start_statek(
    agents=[dispatcher, researcher],
    push_queues=[main_queue],
    max_concurrency=100,
)

Use start_statek_async when the hosting service already owns the event loop:

Start an Async Worker

from statek import start_statek_async
 
await start_statek_async(
    agents=[dispatcher],
    push_queues=[main_queue],
)

Under the hood, startup chooses one of the lower-level agentic loops:

run_agentic_loop for one active agent
run_agentic_fleet for multiple active agents

Both helpers build on run_jobs_loop. On each pass, that lower-level loop does several things:

checks suspended jobs and moves ready ones back to execution
processes messages pushed into existing jobs
processes queued events addressed to supervised agents
optionally starts new jobs from application-specific queue logic
finds READY, WARMING_UP, and STARTED jobs
schedules runnable jobs up to max_concurrency

Each runnable job is still bounded by harness policies such as maximum turns, approximate token usage, total exceptions, and consecutive exceptions. See Harness Policies for the job-level limits that apply inside worker execution.

The lowest-level loop is still available for custom workers:

Run the Jobs Loop Directly

await run_jobs_loop(
    max_concurrency=100,
    queue_prefixes=["main"],
)

In production deployments, the hosting service commonly exposes a db0-rpc /rpc endpoint while the worker runs start_statek. Clients can then use the StatekClientAPI methods exposed by that host to create jobs through a push-style mechanism. STATEK provides the API object and remote-decorated methods; the host application is responsible for configuring, exposing, authenticating, and operating the RPC endpoint.

These helpers are useful when your application has a stream of work, such as incoming user messages, records to analyze, tasks to triage, or events to route.

Many Jobs, Bounded Execution

STATEK can keep many durable jobs alive at once, including many thousands of persisted jobs on a single machine and within a single process when the machine, dbzero storage, tools, and model provider can support it.

That does not mean every job runs at the same time. max_concurrency controls active execution.

For example:

Limit Active Jobs

await run_jobs_loop(
    max_concurrency=50,
    queue_prefixes=["main"],
)

With this shape, the application may have thousands of jobs persisted as READY, SUSPENDED, or DONE, while the loop runs at most about 50 active job workers at once.

Each job has its own Python local state:

# Job A
user = alex
message = alex_message
calendar = alex.calendar

# Job B
user = jordan
message = jordan_message
calendar = jordan.calendar

Those variables do not collide. They live in different jobs. The jobs only affect each other through objects or external systems you intentionally share, such as a common dbzero object, a queue, a tool, a file, or a third-party API.

Actual capacity is operational. It depends on memory, CPU, dbzero storage, model latency, provider limits, tool latency, queue volume, and how expensive each job's Python work is.

Starting Jobs From Queues

There are two common ways a job gets useful Python objects in context.

The first is direct application startup. Your application already knows the task and creates a job with the relevant objects:

Prepare Direct Job Context

user = current_user
message = incoming_message
timestamp = received_at
calendar = user.calendar

The second is dispatcher startup. A worker, coordinator, or agent loop picks work from a queue and starts a new job with the objects that job needs:

Prepare Queued Job Context

task = next_queued_task
 
user = task.user
message = task.message
timestamp = task.created_at
calendar = user.calendar

In both cases, the agent sees normal Python variables. It does not need a special workflow node for "load user", "load calendar", or "remember today" if those objects are already in context.

Once the job starts, the agent can use the variables directly:

Use Queued Context

print(timestamp)
 
open_slots = calendar.find_open_slots(today)
meeting = calendar.find_meeting("planning")
meeting.move_to(open_slots[0])

If calendar, meeting, and related objects are dbzero-backed objects, those changes are durable application state. If the job continues later, its Python state and execution history are still inspectable.

Push Messages

STATEK has a push queue for delivering new input into existing jobs.

Conceptually, an external process can push a message to a job by job UUID:

Push a Message to a Job

queue.push_to_job_console(job_uuid, "Can you move it one hour later?")

The worker loop calls process_push_notifications, finds queued messages, fetches the target job, and calls push_user_message.

If the target job is still active, the message is added to the job's pushed-message state. If the job is already DONE, pushing a user message can move it back to STARTED so the agent can continue from the persisted history.

This is the operational shape for long-lived interactions: a job can finish a turn, persist everything it knows, and later receive another message without losing its prior Python context and history.

Agent Event Queues

STATEK also supports queues addressed to supervised agents.

Conceptually:

Queue an Agent Event

queue.push_to_agent_queue(dispatcher_agent, incoming_event)

The worker loop calls process_agent_events. For each queued event, STATEK creates a new job for the target agent and injects the event into the job under the local variable expected by that agent's warmup definition.

This pattern is useful when the agent itself is the worker for a stream of tasks:

Prepare an Agent Event

event = incoming_event
 
user = event.user
message = event.message
timestamp = event.timestamp
calendar = user.calendar

The important point is the same as direct startup: the new job begins with useful Python objects already in local context. The agent can inspect them, print them, call application methods, and mutate durable objects using ordinary Python.

Futures And Suspension

Suspended jobs are persisted jobs that cannot make progress yet.

The worker loop calls unsuspend_jobs, which checks whether each suspended job's awaited result is ready. If it is ready, the job moves back to STARTED and can continue from the stored continuation point.

The current futures implementation uses polling. That is useful for bounded waits, but it should be used carefully where performance or latency matters.

Use callbacks, pushed messages, or agent events by default for broad external interruption and event delivery. Use futures when the wait belongs naturally inside the agent's Python execution and the number of suspended jobs is controlled.

A good futures-shaped case is warmup code that picks work from a queue:

Pick Work During Warmup

task = pick_next_task()
 
user = task.user
message = task.message
calendar = user.calendar

If no task is ready, the job can suspend. This works best when the framework or runner controls how many jobs are allowed to wait this way.

Crash And Restart Expectations

STATEK persists job state through dbzero-backed objects.

After a process restart, a worker can look at persisted jobs and continue processing jobs whose status says they still have work to do. Job history, Python local state, console output, pushed messages, usage, errors, and continuation fields are stored as durable state.

That is not the same as deterministic replay.

If a tool sent an email, updated a calendar, charged a card, wrote a file, or called an external API before the process crashed, STATEK does not make that external side effect magically replayable or exactly-once. Your application should store durable external IDs, check current state before applying changes, and make side-effecting tools idempotent where possible.

⚠️

STATEK durability is persisted state and inspectable history, not a complete security boundary, tenant-isolation guarantee, exactly-once side-effect system, or deterministic replay of external systems. Production code execution still needs permissioning, secrets discipline, process and resource limits, backups, and monitoring around STATEK restricted mode. See Security for the trust boundary and Replay and Recovery for side-effect recovery patterns.

Logging And Inspection

STATEK exposes several operational records per job:

status: where the job is in the lifecycle
chat_log: user messages, model responses, warmup entries, and subtask notifications
py_env.console: output printed by executed Python
tool_log: tool results and tool errors attached to chat items
py_env.push_log: pushed messages associated with execution positions
error and exception fields: failure details captured during execution
usage: accumulated model usage and cost where provider data and pricing are available; see STATEK Metering

Runtime logging is controlled with configuration such as:

STATEK_LOG_LEVEL=INFO
STATEK_LOGS_PATH=./statek-logs

Use development logs to understand how jobs move. In production, route logs through your normal logging pipeline and be careful about raw user data, secrets, provider keys, and privileged object representations appearing in inspectable history.

Process Boundaries

A single STATEK job loop is intended to coordinate jobs inside one process.

You can still design larger deployments around multiple processes, dbzero prefixes, queues, RPC, and application-specific routing. Treat that as deployment architecture, not automatic clustering. Be explicit about which prefixes a worker reads, which queues it processes, and which external systems each worker is allowed to touch.

Avoid accidental shared state. If two jobs can mutate the same durable object, design the object methods and business rules to handle concurrency, retries, stale assumptions, and duplicate requests.

Operational Checklist

For a serious deployment, decide these things before giving agents broad Python access:

what objects and tools each agent can see
where provider keys and application secrets live
which side effects require explicit permission checks
which tools must be idempotent
what max_concurrency each worker should use
how many suspended jobs and queued events are acceptable
how to monitor job counts, queue depth, exceptions, token usage, cost, and worker health; STATEK Metering explains the usage and cost fields
how durable dbzero state is backed up and restored
how long job history, console output, and logs are retained
how code execution is sandboxed and resource-limited

The operational goal is not to hide the Python model. The goal is to make it controlled: agents execute ordinary Python against the objects you expose, STATEK persists the job and application state, and your deployment defines the safety and reliability boundaries around that power.

Where to go next

Read Configuration for environment variables and limits, STATEK Metering for usage and cost inspection, Security for sandboxing and credentials boundaries, Replay and Recovery for failure policy, and API Reference for runner helper APIs.

Custom Providers Security & Sandboxing