STATEK Jobs
A STATEK job is a durable Python workspace for one unit of agent work.
It is not just an LLM request. A job has Python variables, console output, chat history, tool results, status, usage, error state, and continuation fields. The agent writes and executes Python inside that workspace. STATEK persists the workspace so later steps can continue with the same objects and variables.
For example, a job can start with useful objects already in context:
user
message
calendar
today
timestampThose objects can be supplied directly by application code, prepared by warmup code, or assembled by a dispatcher agent that picked a task from a queue. Once the job starts, the working agent can use them as normal Python:
print(today)
events = calendar.events_for(today)
meeting = calendar.find_meeting("Weekly planning", day=today)
later_slot = calendar.find_empty_slot(after=meeting.ends_at)
meeting.move_to(later_slot)The important part is persistence. If the agent created events, meeting, and later_slot in one step, the same job can reuse those variables in a later step. If calendar and meeting are dbzero-backed objects, updates to them are durable application state.
JobDef and Job
STATEK separates the definition of work from the execution of work.
JobDef is the prepared job definition. It freezes the agent and the setup for the task:
- the
Agent - model and provider metadata
- prompt parameters
- optional warmup code
- chat style
- locale
Job is the durable execution unit. It stores what is happening now:
- current status
PyEnv, the Python execution environment- chat history and tool results
- usage and cost metadata where available
- error state
- parent job, if this job was delegated by another job
- continuation fields for suspended work
Conceptually:
job.job_def # prepared task definition
job.status # READY, WARMING_UP, STARTED, SUSPENDED, or DONE
job.py_env # durable Python workspace
job.chat_log # chat, warmup, user, and subtask history
job.error # error state, when the job failedThe Python Workspace
The job's Python state lives in PyEnv.
local_state is the most important field. It holds the names the agent can use and reuse:
job.py_env.local_state["user"]
job.py_env.local_state["calendar"]
job.py_env.local_state["today"]
job.py_env.local_state["meeting"]PyEnv also tracks:
global_state: globals available to executed codeconsole: captured Python outputexceptions: execution errors associated with history positionspush_log: messages pushed into an active jobexit_status: the result set when code exits- future and continuation tracking used when execution pauses
In plain terms: a job behaves like a persisted Python session. The agent does not need to rebuild the world from scratch on every model call.
Lifecycle
A job moves through a small lifecycle:
READY: the job exists and has not started executingWARMING_UP: STATEK is running startup code before the first normal LLM stepSTARTED: the job is actively progressingSUSPENDED: the job paused while waiting for an external condition or futureDONE: the current unit of work completed; inspect the recorded outcome separately
DONE can represent a successful result, an explicit exit, harness or limit termination, or an error path depending on the fields recorded with the job. Read job.status for the current lifecycle status, query completed units with db0.find(Job, JobStatus.DONE), inspect job.py_env.exit_status for the completion or exit result, and inspect job.error, job-definition errors, console output, or history exceptions for failure details when applicable.
Technically, a job can continue indefinitely through repeated units of work. DONE is a stable resting state for the current turn or unit, not necessarily the final state of the durable job. A user follow-up with push_user_message(...), a subtask notification, a callback, or application code can move a DONE job back to STARTED.
Warmup is optional. If there is no startup code to execute, a job can move from READY to STARTED. When warmup is present, it prepares the same durable Python workspace the agent will use later; see Warmup Code for block syntax, tool-call warmup, and startup patterns.
What Happens in a Step
Each job step advances the durable workspace.
At a high level, STATEK:
- checks whether the current unit is already resting at
DONE - finds the next pending code block, warmup block, or LLM response to process
- executes Python and tool calls against the job's
PyEnv - stores console output, tool results, exceptions, and chat history
- updates continuation fields if the job must wait
- marks the current unit
DONEwhen Python code sets an exit status, an agent turn completes, or an error path terminates that unit
The result is not a separate workflow language. The job is still executing Python, but STATEK records enough state to inspect it, resume it, and coordinate it with other jobs.
Many Jobs in One Process
STATEK can orchestrate many jobs in one process. Jobs that are ready, warming up, started, or suspended can all be present at the same time. The runner schedules runnable jobs with a configured concurrency limit and checks suspended jobs to see whether they can continue.
The jobs do not share Python locals by accident:
# Job A
job_a.py_env.local_state["user"]
job_a.py_env.local_state["calendar"]
# Job B
job_b.py_env.local_state["user"]
job_b.py_env.local_state["calendar"]Both jobs can use variables named user and calendar, but those names live in different PyEnv.local_state dictionaries. They only affect the same application object if your application deliberately gives both jobs a reference to the same dbzero object or external resource.
This is what makes STATEK useful for fleets: one process can hold many persisted jobs, while only a configured number actively execute at once. Large applications can have thousands of jobs waiting, suspended, or ready, with actual throughput determined by memory, storage, model latency, tool latency, and the configured concurrency.
Suspended Jobs
A job can pause instead of failing when it is waiting for something that is not ready yet.
When Python execution hits a future condition, STATEK records:
awaited_result: what the job is waiting fornext_instr_num: where execution should continue- the current job status, usually
SUSPENDED
Later, the job loop can check suspended jobs. If the awaited condition is ready, the job moves back to STARTED and continues from the stored point.
This is the lower-level mechanism behind durable waiting. Dedicated pages cover futures, callbacks, and interruptions in more detail.
History and Logs
STATEK keeps several records because they answer different questions.
chat_log is the model-facing and job-history record. It can include LLM turns, warmup entries, user messages, and subtask notifications.
console is Python output. If the agent runs:
print(calendar.events_for(today))the printed output is captured in the job's console.
Tool logs belong to the chat item that requested the tool call. This keeps tool results separate from ordinary Python console output.
Together, these records make the job inspectable: you can see what the agent was told, what code it ran, what Python printed, what tools returned, and where errors occurred.
Errors and Recovery Boundaries
Jobs store error state, and warmup failures can be recorded against the job definition. Non-warmup Python exceptions are captured into the job's execution history so the agent or an operator can inspect what happened.
Recovery behavior depends on the kind of failure and the application around STATEK:
- a Python exception can be visible in the job history
- a warmup error can terminate setup for that job definition
- a suspended job can continue when its awaited condition is ready
- parent jobs can receive child-job completion or error notifications
- application code can register error handlers for operational cleanup
For cleanup patterns around claimed external work, see Error Handling.
Do not treat this as automatic deterministic replay. If a job already sent an email, moved a meeting, or called an external API, STATEK does not magically undo or replay that side effect correctly. Your application should make external operations idempotent, permissioned, and inspectable.
STATEK persists job state and history, but durable history is not deterministic replay. Production code execution needs sandboxing, permissioning, secrets discipline, resource limits, and application-level controls for external side effects. See Security and Replay and Recovery for those boundaries.
Source of Truth
There are two kinds of durable information in a STATEK application.
First, there is application state: your dbzero-backed Python objects. A User, Calendar, Meeting, or AnalysisResult can be an ordinary Python class persisted by dbzero. When an agent updates those objects through Python, that is application state.
Second, there is job history: chat log, console output, tool results, usage, errors, and continuation fields. That history explains how the job got where it is.
For most applications, the source of truth is the application object model. The job history is the durable execution record that makes agent work observable and resumable.
When to Create a New Job
Continue an existing job when the agent should remember and reuse the same Python variables for the same unit of work.
Create a new job when the work has a separate owner, lifecycle, permission boundary, or task result. Common examples:
- a dispatcher receives a new queued message and creates a job with
user,message, andtimestampin context - a parent agent delegates a focused subtask to a specialist agent
- a background worker starts an analysis job over a durable dataset
- a user-facing conversation creates a separate job for a long-running operation
The rule of thumb is simple: if the work should have its own durable Python workspace and inspectable history, make it a job.
Where to go next
Read Agents for role and prompt setup, Tools for controlled capabilities, Callbacks and Interruptions for external events, and Operations for worker-loop concerns.