AI AgentsMarch 12, 20263 min read

AI agents need state machines, not just better prompts

Why dependable agents require explicit state, bounded loops, typed tools, approvals, and recovery logic around the language model.

An agent is often drawn as a loop: think, use a tool, observe, repeat.

That picture captures the core idea, but it leaves out the engineering needed to stop the loop from drifting, repeating actions, spending without limit, or changing the wrong system.

A dependable agent is not merely a model with tools. It is a stateful application in which the model is allowed to make specific decisions.

Separate reasoning from control

The model can propose the next action. The application should control whether that action is valid.

For example, an agent working on code may move through explicit states:

received -> inspecting -> planning -> editing -> testing -> complete
                                  \-> needs_approval
                                  \-> blocked

Each state limits the available transitions and tools. A testing state may run tests but should not silently publish a release. A read-only inspection state should not mutate files.

This reduces the number of decisions the model must improvise.

State must live outside the transcript

If progress exists only as prose in a chat, recovery is difficult. The agent needs machine-readable state:

Objective and success conditions.
Current phase.
Completed and pending steps.
Artifacts created or changed.
Tool calls and their results.
Budget consumed.
Approvals granted.
Errors and retry counts.

This state allows an interrupted run to resume without asking the model to reconstruct reality from a long conversation.

Tools are capabilities, not menu items

A tool definition is a security boundary. It should expose the smallest useful capability with typed inputs and outputs.

update_customer_address(customer_id, address) is safer and easier to validate than run_sql(query). deploy_preview(commit) is easier to govern than unrestricted shell access.

Before execution, the application can check schema validity, permissions, environment, rate limits, and whether a human approval is required. After execution, it can record an immutable result.

Every loop needs bounds

Open-ended autonomy sounds powerful until an agent repeats the same failing action thirty times.

Useful limits include:

Limit	Purpose
Maximum steps	Prevent infinite loops
Time and token budget	Bound latency and cost
Per-tool retry count	Stop repeated failures
Duplicate-action detection	Catch stalled plans
Side-effect budget	Limit writes, sends, purchases, or deployments
Confidence or evidence threshold	Escalate uncertain decisions

Stopping is part of intelligence. A reliable agent knows when it lacks permission, evidence, or a valid transition.

Idempotency makes recovery possible

Networks fail after a request succeeds but before the response arrives. If the agent retries blindly, it may send the same email twice or create duplicate records.

Side-effecting tools should accept idempotency keys or expose a way to check prior execution. Workflows should record intent before action and outcome after action. Compensation paths should exist when an operation cannot simply be reversed.

These are ordinary distributed-systems concerns. Agents do not make them disappear.

Human approval should be precise

“Ask before doing anything important” creates constant interruption. “Let the agent handle it” creates uncontrolled risk.

Approval should attach to concrete transitions: sending an external message, spending above an amount, deleting data, deploying to production, or sharing information outside a trust boundary. The approval screen should show the intended action, parameters, evidence, and expected effect.

Our opinion

The next improvement in agents will not come only from models reasoning longer. It will come from runtimes that make state, permissions, transitions, and failures visible.

The model supplies flexible judgment. The surrounding state machine supplies discipline. Serious agents need both.