STATEK Jobs

A STATEK job is a durable Python workspace for one unit of agent work.

It is not just an LLM request. A job has Python variables, console output, chat history, tool results, status, usage, error state, and continuation fields. The agent writes and executes Python inside that workspace. STATEK persists the workspace so later steps can continue with the same objects and variables.

For example, a job can start with useful objects already in context:

user
message
calendar
today
timestamp

Those objects can be supplied directly by application code, prepared by warmup code, or assembled by a dispatcher agent that picked a task from a queue. Once the job starts, the working agent can use them as normal Python:

Work With Job Context

print(today)
 
events = calendar.events_for(today)
meeting = calendar.find_meeting("Weekly planning", day=today)
later_slot = calendar.find_empty_slot(after=meeting.ends_at)
meeting.move_to(later_slot)

The important part is persistence. If the agent created events, meeting, and later_slot in one step, the same job can reuse those variables in a later step. If calendar and meeting are dbzero-backed objects, updates to them are durable application state.

JobDef and Job

STATEK separates the definition of work from the execution of work.

JobDef is the prepared job definition. It freezes the agent and the setup for the task:

the Agent
model and provider metadata
prompt parameters
optional warmup code
chat style
locale

Job is the durable execution unit. It stores what is happening now:

current status
PyEnv, the Python execution environment
chat history and tool results
usage and cost metadata where available
error state
parent job, if this job was delegated by another job
continuation fields for suspended work

Conceptually:

Inspect Job State

job.job_def          # prepared task definition
job.status           # READY, WARMING_UP, STARTED, SUSPENDED, or DONE
job.py_env           # durable Python workspace
job.chat_log         # chat, warmup, user, and subtask history
job.error            # error state, when the job failed

The Python Workspace

The job's Python state lives in PyEnv.

local_state is the most important field. It holds the names the agent can use and reuse:

Read Persisted Locals

job.py_env.local_state["user"]
job.py_env.local_state["calendar"]
job.py_env.local_state["today"]
job.py_env.local_state["meeting"]

PyEnv also tracks:

global_state: globals available to executed code
console: captured Python output
exceptions: execution errors associated with history positions
push_log: messages pushed into an active job
exit_status: the result set when code exits
future and continuation tracking used when execution pauses

In plain terms: a job behaves like a persisted Python session. The agent does not need to rebuild the world from scratch on every model call.

Lifecycle

A job moves through a small lifecycle:

READY: the job exists and has not started executing
WARMING_UP: STATEK is running startup code before the first normal LLM step
STARTED: the job is actively progressing
SUSPENDED: the job paused while waiting for an external condition or future
DONE: the current unit of work completed; inspect the recorded outcome separately

DONE can represent a successful result, an explicit exit, harness or limit termination, or an error path depending on the fields recorded with the job. Read job.status for the current lifecycle status, query completed units with db0.find(Job, JobStatus.DONE), inspect job.py_env.exit_status for the completion or exit result, and inspect job.error, job-definition errors, console output, or history exceptions for failure details when applicable.

Technically, a job can continue indefinitely through repeated units of work. DONE is a stable resting state for the current turn or unit, not necessarily the final state of the durable job. A user follow-up with push_user_message(...), a subtask notification, a callback, or application code can move a DONE job back to STARTED.

Warmup is optional. If there is no startup code to execute, a job can move from READY to STARTED. When warmup is present, it prepares the same durable Python workspace the agent will use later; see Warmup Code for block syntax, tool-call warmup, and startup patterns.

What Happens in a Step

Each job step advances the durable workspace.

At a high level, STATEK:

checks whether the current unit is already resting at DONE
finds the next pending code block, warmup block, or LLM response to process
executes Python and tool calls against the job's PyEnv
stores console output, tool results, exceptions, and chat history
updates continuation fields if the job must wait
marks the current unit DONE when Python code sets an exit status, an agent turn completes, or an error path terminates that unit

The result is not a separate workflow language. The job is still executing Python, but STATEK records enough state to inspect it, resume it, and coordinate it with other jobs.

Many Jobs in One Process

STATEK can orchestrate many jobs in one process. Jobs that are ready, warming up, started, or suspended can all be present at the same time. The runner schedules runnable jobs with a configured concurrency limit and checks suspended jobs to see whether they can continue.

The jobs do not share Python locals by accident:

# Job A
job_a.py_env.local_state["user"]
job_a.py_env.local_state["calendar"]
 
# Job B
job_b.py_env.local_state["user"]
job_b.py_env.local_state["calendar"]

Both jobs can use variables named user and calendar, but those names live in different PyEnv.local_state dictionaries. They only affect the same application object if your application deliberately gives both jobs a reference to the same dbzero object or external resource.

This is what makes STATEK useful for fleets: one process can hold many persisted jobs, while only a configured number actively execute at once. Large applications can have thousands of jobs waiting, suspended, or ready, with actual throughput determined by memory, storage, model latency, tool latency, and the configured concurrency.

Suspended Jobs

A job can pause instead of failing when it is waiting for something that is not ready yet.

When Python execution hits a future condition, STATEK records:

awaited_result: what the job is waiting for
next_instr_num: where execution should continue
the current job status, usually SUSPENDED

Later, the job loop can check suspended jobs. If the awaited condition is ready, the job moves back to STARTED and continues from the stored point.

This is the lower-level mechanism behind durable waiting. Dedicated pages cover futures, callbacks, and interruptions in more detail.

History and Logs

STATEK keeps several records because they answer different questions.

chat_log is the model-facing and job-history record. It can include LLM turns, warmup entries, user messages, and subtask notifications.

console is Python output. If the agent runs:

Print Job Output

print(calendar.events_for(today))

the printed output is captured in the job's console.

Tool logs belong to the chat item that requested the tool call. This keeps tool results separate from ordinary Python console output.

Together, these records make the job inspectable: you can see what the agent was told, what code it ran, what Python printed, what tools returned, and where errors occurred.

Errors and Recovery Boundaries

Jobs store error state, and warmup failures can be recorded against the job definition. Non-warmup Python exceptions are captured into the job's execution history so the agent or an operator can inspect what happened.

Recovery behavior depends on the kind of failure and the application around STATEK:

a Python exception can be visible in the job history
a warmup error can terminate setup for that job definition
a suspended job can continue when its awaited condition is ready
parent jobs can receive child-job completion or error notifications
application code can register error handlers for operational cleanup

For cleanup patterns around claimed external work, see Error Handling.

Do not treat this as automatic deterministic replay. If a job already sent an email, moved a meeting, or called an external API, STATEK does not magically undo or replay that side effect correctly. Your application should make external operations idempotent, permissioned, and inspectable.

⚠️

STATEK persists job state and history, but durable history is not deterministic replay. Production code execution needs sandboxing, permissioning, secrets discipline, resource limits, and application-level controls for external side effects. See Security and Replay and Recovery for those boundaries.

Source of Truth

There are two kinds of durable information in a STATEK application.

First, there is application state: your dbzero-backed Python objects. A User, Calendar, Meeting, or AnalysisResult can be an ordinary Python class persisted by dbzero. When an agent updates those objects through Python, that is application state.

Second, there is job history: chat log, console output, tool results, usage, errors, and continuation fields. That history explains how the job got where it is.

For most applications, the source of truth is the application object model. The job history is the durable execution record that makes agent work observable and resumable.

When to Create a New Job

Continue an existing job when the agent should remember and reuse the same Python variables for the same unit of work.

Create a new job when the work has a separate owner, lifecycle, permission boundary, or task result. Common examples:

a dispatcher receives a new queued message and creates a job with user, message, and timestamp in context
a parent agent delegates a focused subtask to a specialist agent
a background worker starts an analysis job over a durable dataset
a user-facing conversation creates a separate job for a long-running operation

The rule of thumb is simple: if the work should have its own durable Python workspace and inspectable history, make it a job.

Where to go next

Read Agents for role and prompt setup, Tools for controlled capabilities, Callbacks and Interruptions for external events, and Operations for worker-loop concerns.

Core Concepts Agents