← All concepts

Agent loop

The skeleton every agent shares — read state, ask the model, parse, act, repeat — and how the wiring choices shape every other system around it.

12 projects 4 insights 4 variants
TL;DR 9 min read

Every agent in the corpus runs the same five-step skeleton: gather context, call the model, parse the response, run any tools, fold the result back in. The interesting question isn’t what the loop does — it’s the container you put it in. Four containers are alive in the wild: a generator, a while, an event log, a graph. Each makes one thing easy and one thing hard.

  • Generator — best when a human watches.
  • While-loop — best when you want the simplest thing that works.
  • Event log — best when you need replay or audit.
  • Graph — best when steps run in parallel.

Agent loop

Picture an engineer with a debugger open and a long task list. They look at the screen, decide what to do next, do it, look again, repeat. An LLM agent is the same control flow with a model in the middle. That’s it. Everything else — streaming, multi-agent, memory compression, recovery — is built around this skeleton.

flowchart LR
S[Context / memory] -->|build prompt| L[Call the model]
L -->|stream| P[Parse intent]
P -->|tool call| T[Run tool]
P -->|final answer| O[Done]
T -->|observation| S
The skeleton every variant elaborates on.

The four containers

The agent function is an iterator. Each tick it yields an event — a thinking cue, a token, a tool call, a final answer. Whoever loops over the iterator decides what to render and when to stop.

async function* runAgent(state) {
  while (!done) {
    yield { type: 'thinking' };
    const stream = await llm.stream(state.messages);
    for await (const chunk of stream) {
      yield { type: 'token', text: chunk };
      if (chunk.kind === 'tool_use') {
        const result = await dispatch(chunk);
        state.append(result);
        yield { type: 'tool_result', result };
        break; // re-enter outer loop with new context
      }
    }
  }
}

The shape pays for itself when a human is watching. The UI redraws on each yield with no extra code — it’s just an async for. Interruption is “stop iterating.” Backpressure is automatic; if the renderer falls behind, the model isn’t asked for the next token.

The cost is process boundaries. Generators don’t cross processes well, can’t be checkpointed mid-iteration, and don’t fan out to multiple consumers without re-design.

You’ll recognize this shape in: Claude Code, NanoClaw, Mistral Vibe, OpenClaw — all interactive CLIs.

The simplest container that works. Read it left-to-right and it’s exactly the diagram.

turn = 0
while turn < MAX_TURNS:
    turn += 1
    response = llm.chat(messages, tools=tools)
    if response.tool_calls:
        for tc in response.tool_calls:
            messages.append({"role": "tool", "content": dispatch(tc)})
    else:
        return response.content

Easy to teach, easy to port across languages, easy to onboard a teammate to. The cost: streaming the UI now needs a callback or a queue; recovering from a crash mid-loop means you must explicitly checkpoint the message list.

You’ll recognize this shape in: Strix, Hermes, Kimi Code, ML Intern.

There is no “loop” — only an append-only log of events. Every action emits an event. Every observation emits an event. A controller subscribes and decides what action to emit next based on the log so far. Replaying the log reconstitutes any state.

flowchart LR
C[Controller] -->|emit action| EL[(Event Log)]
EL -->|subscribe| C
EL -->|subscribe| UI
EL -->|subscribe| Recorder
C -->|run| Tools[Tool layer]
Tools -->|emit observation| EL
An event log replaces the loop: every action and observation is an immutable event.

The wins are heavy: time-travel debugging is free, you can replay a session deterministically, the audit trail is the source of truth, and other components (recorder, microagent triggers) just become subscribers. The cost is the curve to learn it; schema evolution must be planned because old events live forever.

You’ll recognize this shape in: OpenHands v0 and v1.

Stages are nodes, transitions are edges. The orchestrator steps the graph; agents run inside nodes. Useful when several independent steps run at once, or when specialists alternate (planner → executor → critic → planner).

const graph = new Graph()
  .node('plan', plan)
  .node('execute', execute)
  .node('critique', critique)
  .edge('plan', 'execute')
  .edge('execute', 'critique')
  .edge('critique', 'plan', when((r) => r.needsReplan));

You get parallelism for free, nodes are individually testable, and complex flows stay readable as the topology grows. The tax: it’s overkill for simple agents and you’ve now bought into a graph framework.

You’ll recognize this shape in: Multica, Open Design (loosely).

Anatomy of a single iteration

This sequence is the same regardless of container. The container just decides who calls whom.

  1. Compose the request

    Pull current messages, the (mostly cached) system prompt, the tool schemas, and any per-turn additions — a memory recall, a scope reminder, a clock. The cheap-but-correct step that determines the next ten thousand input tokens.

  2. Stream from the model

    Open a streaming connection. Streaming earns three things at once: token-level UI, the option to stop reading early, and the ability to dispatch tools while arguments are still arriving.

  3. Parse for intent

    The model returns text, a tool call, a thought block, or an answer. The parser is format-specific — Anthropic tool_use blocks, XML <function> tags, OpenAI tool_calls. See tool-calling-formats.

  4. (Optional) short-circuit

    A single tool call per turn is the common case. Strix forbids more than one and aborts the stream as soon as it sees the closing tag — anything after is hallucinated rambling that you’d otherwise pay for. See streaming-early-stop.

  5. Dispatch the tool

    Validate args (Zod / Pydantic / a JSON schema), execute it, capture the structured result. This is the natural place for guardrails: scope checks, allowlists, rate limits, sandbox boundaries.

  6. Fold the result back in

    Push the observation into the message list as a tool_result (or whatever your format uses). Go to step 1, with one more turn used.

  7. Exit conditions

    Three of them. Budget exhausted (turns / tokens / dollars), an explicit final_answer tool, or the model returns no tool calls and a final text block.

Iteration budgets — and the cheap trick that improves output

Every loop in the corpus has a hard cap. The good ones also tell the agent it’s running out, so it can wrap up gracefully rather than be guillotined mid-thought.

ProjectDefault budgetWarning behavior
Claude Codeper-task max turnsimplicit, via token tracking
OpenHandsconfigurablebudget escalation per model
Strix300warnings at 85% + last 3
Hermes90token-based budget

Pick a container

? What matters most for your agent?
  • A human watches it work Generator default
  • Need replay / audit trail Event log
  • Steps must run in parallel Graph
  • Smallest thing that works While-loop

Recommended default: If you're not sure, start with a while-loop. Migrate to a generator the first time you build a UI; migrate to events the first time someone asks for an audit log.

Anti-patterns from the corpus

  • No iteration limit. Easy to write, easy to bankrupt yourself when a tool fails in a way that the model “fixes” by retrying forever.
  • No de-duplication of failed tool calls. The agent retries the same broken call. Surface a “you just tried this” hint in the next prompt; the model will pivot.
  • Streaming + blocking tool dispatch in the same call site. UI freezes. Hand off the dispatch to a queue and continue the stream.
  • Multiple tool calls per turn with no policy. Either embrace parallel tool use end-to-end (with a side-effect classifier — pure tools parallelize, others serialize) or forbid it entirely. Ambiguous middle ground is bug-prone.
  • Token-budget creep. “Just one more memory recall, scope reminder, time stamp.” These compound; an agent that started at 6K input/turn ends up at 30K six months in. Audit prompt size per turn, not per session.

Projects that implement this

  • Claude Code — Anthropic's official agentic CLI. Streaming tool calls, prompt caching, thinking signatures, multi-agent subagents, slash commands.
  • OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
  • Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
  • OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
  • Mistral Vibe — Mistral-flavored coding agent reference. Middleware-based dispatch, minimal tool set, instructive for understanding agent loop fundamentals.
  • Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
  • NanoClaw — Tiny Claude-Code-shaped clone. Excellent for studying the irreducible structure of an agent loop without production overhead.
  • OpenClaw — Open-source Claude-Code-style agent reproduction. Bigger than NanoClaw, reveals which patterns scale and which stay minimal.
  • Kimi Code — Moonshot's Kimi-flavored coding agent. Compact reference for an agent loop with OpenAI-compatible tool calling.
  • ML Intern — ML-engineering-flavored agent. Tooling for data exploration, model training, and notebook-style work.
  • Open Design — Open-source design / UI-generation agent. LLM-driven design intent → code, with a design-system-aware tool surface.
  • Multica — Multi-cloud / multi-agent orchestration. Architecture patterns for spanning providers and clouds in one agent.