STATEK Operations
STATEK runs inside your application process or worker process.
It is not a separate hosted agent service by default. Your application creates agents, jobs, queues, and dbzero-backed objects. A STATEK worker loop then looks at durable job state and decides what should run next.
The operational model is simple:
- Jobs are persisted as application state.
- Each job owns its Python execution state, chat history, console output, tool log, status, and errors.
- A worker loop repeatedly finds jobs that can make progress.
- The loop runs only a bounded number of jobs at the same time.
- When a job prints, calls tools, changes dbzero objects, waits, finishes, or receives a pushed message, that state is persisted.
That is the core reason STATEK can coordinate many agent jobs in one process. The jobs share one scheduler, but their Python locals and execution histories are separate.
Worker Loops
A typical STATEK worker or service starts with start_statek.
It prepares the process, applies configured agent definitions, initializes the client API object used by RPC-enabled deployments, and then starts the job processing loop:
from statek import start_statek
start_statek(
agents=[dispatcher, researcher],
push_queues=[main_queue],
max_concurrency=100,
)Use start_statek_async when the hosting service already owns the event loop:
from statek import start_statek_async
await start_statek_async(
agents=[dispatcher],
push_queues=[main_queue],
)Under the hood, startup chooses one of the lower-level agentic loops:
run_agentic_loopfor one active agentrun_agentic_fleetfor multiple active agents
Both helpers build on run_jobs_loop. On each pass, that lower-level loop does several things:
- checks suspended jobs and moves ready ones back to execution
- processes messages pushed into existing jobs
- processes queued events addressed to supervised agents
- optionally starts new jobs from application-specific queue logic
- finds
READY,WARMING_UP, andSTARTEDjobs - schedules runnable jobs up to
max_concurrency
Each runnable job is still bounded by harness policies such as maximum turns, approximate token usage, total exceptions, and consecutive exceptions. See Harness Policies for the job-level limits that apply inside worker execution.
The lowest-level loop is still available for custom workers:
await run_jobs_loop(
max_concurrency=100,
queue_prefixes=["main"],
)In production deployments, the hosting service commonly exposes a db0-rpc /rpc endpoint while the worker runs start_statek. Clients can then use the StatekClientAPI methods exposed by that host to create jobs through a push-style mechanism. STATEK provides the API object and remote-decorated methods; the host application is responsible for configuring, exposing, authenticating, and operating the RPC endpoint.
These helpers are useful when your application has a stream of work, such as incoming user messages, records to analyze, tasks to triage, or events to route.
Many Jobs, Bounded Execution
STATEK can keep many durable jobs alive at once, including many thousands of persisted jobs on a single machine and within a single process when the machine, dbzero storage, tools, and model provider can support it.
That does not mean every job runs at the same time. max_concurrency controls active execution.
For example:
await run_jobs_loop(
max_concurrency=50,
queue_prefixes=["main"],
)With this shape, the application may have thousands of jobs persisted as READY, SUSPENDED, or DONE, while the loop runs at most about 50 active job workers at once.
Each job has its own Python local state:
# Job A
user = alex
message = alex_message
calendar = alex.calendar# Job B
user = jordan
message = jordan_message
calendar = jordan.calendarThose variables do not collide. They live in different jobs. The jobs only affect each other through objects or external systems you intentionally share, such as a common dbzero object, a queue, a tool, a file, or a third-party API.
Actual capacity is operational. It depends on memory, CPU, dbzero storage, model latency, provider limits, tool latency, queue volume, and how expensive each job's Python work is.
Starting Jobs From Queues
There are two common ways a job gets useful Python objects in context.
The first is direct application startup. Your application already knows the task and creates a job with the relevant objects:
user = current_user
message = incoming_message
timestamp = received_at
calendar = user.calendarThe second is dispatcher startup. A worker, coordinator, or agent loop picks work from a queue and starts a new job with the objects that job needs:
task = next_queued_task
user = task.user
message = task.message
timestamp = task.created_at
calendar = user.calendarIn both cases, the agent sees normal Python variables. It does not need a special workflow node for "load user", "load calendar", or "remember today" if those objects are already in context.
Once the job starts, the agent can use the variables directly:
print(timestamp)
open_slots = calendar.find_open_slots(today)
meeting = calendar.find_meeting("planning")
meeting.move_to(open_slots[0])If calendar, meeting, and related objects are dbzero-backed objects, those changes are durable application state. If the job continues later, its Python state and execution history are still inspectable.
Push Messages
STATEK has a push queue for delivering new input into existing jobs.
Conceptually, an external process can push a message to a job by job UUID:
queue.push_to_job_console(job_uuid, "Can you move it one hour later?")The worker loop calls process_push_notifications, finds queued messages, fetches the target job, and calls push_user_message.
If the target job is still active, the message is added to the job's pushed-message state. If the job is already DONE, pushing a user message can move it back to STARTED so the agent can continue from the persisted history.
This is the operational shape for long-lived interactions: a job can finish a turn, persist everything it knows, and later receive another message without losing its prior Python context and history.
Agent Event Queues
STATEK also supports queues addressed to supervised agents.
Conceptually:
queue.push_to_agent_queue(dispatcher_agent, incoming_event)The worker loop calls process_agent_events. For each queued event, STATEK creates a new job for the target agent and injects the event into the job under the local variable expected by that agent's warmup definition.
This pattern is useful when the agent itself is the worker for a stream of tasks:
event = incoming_event
user = event.user
message = event.message
timestamp = event.timestamp
calendar = user.calendarThe important point is the same as direct startup: the new job begins with useful Python objects already in local context. The agent can inspect them, print them, call application methods, and mutate durable objects using ordinary Python.
Futures And Suspension
Suspended jobs are persisted jobs that cannot make progress yet.
The worker loop calls unsuspend_jobs, which checks whether each suspended job's awaited result is ready. If it is ready, the job moves back to STARTED and can continue from the stored continuation point.
The current futures implementation uses polling. That is useful for bounded waits, but it should be used carefully where performance or latency matters.
Use callbacks, pushed messages, or agent events by default for broad external interruption and event delivery. Use futures when the wait belongs naturally inside the agent's Python execution and the number of suspended jobs is controlled.
A good futures-shaped case is warmup code that picks work from a queue:
task = pick_next_task()
user = task.user
message = task.message
calendar = user.calendarIf no task is ready, the job can suspend. This works best when the framework or runner controls how many jobs are allowed to wait this way.
Crash And Restart Expectations
STATEK persists job state through dbzero-backed objects.
After a process restart, a worker can look at persisted jobs and continue processing jobs whose status says they still have work to do. Job history, Python local state, console output, pushed messages, usage, errors, and continuation fields are stored as durable state.
That is not the same as deterministic replay.
If a tool sent an email, updated a calendar, charged a card, wrote a file, or called an external API before the process crashed, STATEK does not make that external side effect magically replayable or exactly-once. Your application should store durable external IDs, check current state before applying changes, and make side-effecting tools idempotent where possible.
STATEK durability is persisted state and inspectable history, not a complete security boundary, tenant-isolation guarantee, exactly-once side-effect system, or deterministic replay of external systems. Production code execution still needs permissioning, secrets discipline, process and resource limits, backups, and monitoring around STATEK restricted mode. See Security for the trust boundary and Replay and Recovery for side-effect recovery patterns.
Logging And Inspection
STATEK exposes several operational records per job:
status: where the job is in the lifecyclechat_log: user messages, model responses, warmup entries, and subtask notificationspy_env.console: output printed by executed Pythontool_log: tool results and tool errors attached to chat itemspy_env.push_log: pushed messages associated with execution positionserrorand exception fields: failure details captured during executionusage: accumulated model usage and cost where provider data and pricing are available; see STATEK Metering
Runtime logging is controlled with configuration such as:
STATEK_LOG_LEVEL=INFO
STATEK_LOGS_PATH=./statek-logsUse development logs to understand how jobs move. In production, route logs through your normal logging pipeline and be careful about raw user data, secrets, provider keys, and privileged object representations appearing in inspectable history.
Process Boundaries
A single STATEK job loop is intended to coordinate jobs inside one process.
You can still design larger deployments around multiple processes, dbzero prefixes, queues, RPC, and application-specific routing. Treat that as deployment architecture, not automatic clustering. Be explicit about which prefixes a worker reads, which queues it processes, and which external systems each worker is allowed to touch.
Avoid accidental shared state. If two jobs can mutate the same durable object, design the object methods and business rules to handle concurrency, retries, stale assumptions, and duplicate requests.
Operational Checklist
For a serious deployment, decide these things before giving agents broad Python access:
- what objects and tools each agent can see
- where provider keys and application secrets live
- which side effects require explicit permission checks
- which tools must be idempotent
- what
max_concurrencyeach worker should use - how many suspended jobs and queued events are acceptable
- how to monitor job counts, queue depth, exceptions, token usage, cost, and worker health; STATEK Metering explains the usage and cost fields
- how durable dbzero state is backed up and restored
- how long job history, console output, and logs are retained
- how code execution is sandboxed and resource-limited
The operational goal is not to hide the Python model. The goal is to make it controlled: agents execute ordinary Python against the objects you expose, STATEK persists the job and application state, and your deployment defines the safety and reliability boundaries around that power.
Where to go next
Read Configuration for environment variables and limits, STATEK Metering for usage and cost inspection, Security for sandboxing and credentials boundaries, Replay and Recovery for failure policy, and API Reference for runner helper APIs.