16. The Agentic Loop — Schematic Walkthrough

This is a detailed, step-by-step of what actually happens inside /workspaces/src/query.ts:219's queryLoop() during one iteration. Skim doc 4 for the 30,000-foot view; come here when you need to know which line does what.

16.1 The loop at a glance

State := { messages, toolUseContext, turnCount, compactTracking, recoveryFlags, transition, pendingSummary }

while (true) {
  ┌─ Phase 1: Setup        (lines 311–580)
  │    destructure state, pre-query compaction, model selection
  │
  ├─ Phase 2: Model stream (lines 653–863)
  │    start API stream, consume events, emit assistant messages,
  │    collect tool_use blocks, optionally start tools streaming
  │
  ├─ Phase 3: Errors       (lines 893–997)
  │    FallbackTriggeredError → model switch, retry. Other errors → return.
  │
  ├─ Phase 4: Post-sampling (lines 999–1060)
  │    fire post-sampling hooks, handle abort during streaming,
  │    yield pending haiku summary from prior turn.
  │
  ├─ Phase 5a: No tool use (lines 1062–1358)    ← terminal or recovery
  │    PTL recovery → collapse drain / reactive compact
  │    max-output recovery → escalate or multi-turn continue
  │    stop hooks, token budget continuation,
  │    else return `{ reason: 'completed' }`.
  │
  ├─ Phase 5b: Tool use    (lines 1359–1727)    ← normal continuation
  │    execute all tools (streaming or serial),
  │    generate next-turn haiku summary (fire-and-forget),
  │    check abort, collect attachments, drain queued commands,
  │    refresh tools, check maxTurns, reassign state.
  │
  └─ continue
}

16.2 State shape

// query.ts:204-217
type State = {
  messages: Message[]
  toolUseContext: ToolUseContext
  autoCompactTracking: AutoCompactTrackingState | undefined
  maxOutputTokensRecoveryCount: number
  hasAttemptedReactiveCompact: boolean
  maxOutputTokensOverride: number | undefined
  pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
  stopHookActive: boolean | undefined
  turnCount: number
  transition: Continue | undefined
}

Values reset on each continue; only toolUseContext can be mutated mid-iteration (to add query tracking at lines 360-363).

Initial state at line 268 sets turnCount: 1, transition: undefined. budgetTracker at line 280 (for TOKEN_BUDGET) and taskBudgetRemaining at line 291 are outside State — they persist across all iterations of one query() invocation, not just one turn.

16.3 Phase 1: setup (lines 311–580)

Inside each iteration:

Destructure state (lines 311-321).
Emit stream_request_start (line 337). UI and SDK interpret this as "a new request is about to hit the API."
Profile checkpoint query_fn_entry (line 339).
Query chain tracking (lines 347-363). Depth is incremented or a chainId is initialized; written into toolUseContext so downstream tools can include it in analytics. This is the only place toolUseContext is reassigned mid-iteration.
Message preparation (line 365): getMessagesAfterLastCompactBoundary(messages) — the compaction boundary is preserved as a marker, and the model only sees messages after it.
Content replacement (lines 376-394): per-message tool-result budgeting. Large tool results are swapped for disk-persisted previews via recordContentReplacement.
Snip (lines 401-410) if HISTORY_SNIP feature on — optional targeted message removal.
Microcompact (lines 413-426) if enabled — cache-aware, inline message-level compaction (not full summary).
Context collapse (lines 440-447) if enabled — read-time projection of collapsed regions.
Auto-compact check (lines 453-543): the big one. Calls shouldAutoCompact() (see doc 4.6). If needed, runs a full compaction, yields the boundary message, captures preCompactContext into taskBudgetRemaining.
Streaming executor (lines 561-568): instantiate StreamingToolExecutor if gate is on.
Model resolution, dump prompts, blocking-limit check (568-648).

If a blocking context-window limit is hit and no recovery path is available, return early with { reason: 'blocking_limit' } (lines 628-648).

16.4 Phase 2: the model stream (lines 653–863)

The outer fallback loop

// query.ts:654
while (attemptWithFallback) { … }

Set to false initially; set to true by FallbackTriggeredError handling (line 897). Allows the same turn's request to retry with a different model.

The inner for-await

// query.ts:659
for await (const message of deps.callModel(/* params */)) { … }

deps.callModel is queryModelWithStreaming by default (query/deps.ts:33-40), pluggable for tests. It is an async generator yielding one Message per content block as streaming progresses.

What each yielded message triggers

stream_event → pass through to the caller via yield.
assistant message:
- Fallback tombstone check (lines 712-741): if a fallback is in progress, mark this message as tombstoned (kept in the array for recovery but not yielded).
- Backfill observable input (lines 747-787): apply each tool's backfillObservableInput to a copy of the tool_use input so SDK observers / transcripts see legacy fields without mutating the API-bound shape. Comment: "The original API-bound input is never mutated (preserves prompt cache)."
- Withholding (lines 799-825): if the message is a recoverable error (PTL, max-output-tokens, media-size), don't yield it yet — recovery phase may replace it.
- Tool-use collection (lines 826-844): filter content for tool_use blocks, push to toolUseBlocks, set needsFollowUp = true.
- Streaming executor enqueue (line 842): if active, add each tool to the executor immediately.
Completed streaming tool results (lines 847-862): yield getCompletedResults() — any tools that finished while the stream was ongoing.

Cached microcompact boundary yield

Lines 870-892 yield the compact boundary message now that the actual cache_deleted_input_tokens delta is known (the accurate number is only available after the API response).

16.5 Phase 3: errors (lines 893–997)

`FallbackTriggeredError` (line 894)

Tombstone orphaned assistant messages (line 717-741 and surrounding).
Generate synthetic tool_result blocks for orphan tool_uses: "Model fallback triggered" (line 900-903). Without this, the next API call would 400 on tool_use/tool_result pairing.
Clear the streaming executor (query.ts:734-739) — orphaned results are dangerous.
Switch current model.
Strip thinking signatures for non-matching fallback models (line 928) — thinking blocks are model-bound.
Yield a system "model switched" message (lines 945-948).
attemptWithFallback = true + continue the outer while (line 950).

Other API errors (lines 955-997)

Yield any missing tool_results synthetically, surface the error as a user-visible message, return { reason: 'model_error' }.

16.6 Phase 4: post-sampling (lines 999–1060)

Post-sampling hooks (lines 1000-1008). Fire-and-forget; observation-only; no state change.
Abort during streaming (lines 1015-1051). If the AbortController fired mid-stream:
- Consume any remaining StreamingToolExecutor results — the executor generates synthetic tool_results for aborted tools.
- Yield an "interrupted by user" message (skipped when signal.reason === 'interrupt' because the queued user message provides context).
- Return { reason: 'aborted_streaming' }.
Yield pending haiku summary (lines 1054-1060). state.pendingToolUseSummary is a promise created in the previous turn (~1s Haiku call that ran alongside the ~5-30s main model call). Awaiting it here gives the UI a compact summary of the last tool batch just in time.

16.7 Phase 5a: no tool use (lines 1062–1358)

Reached when !needsFollowUp — the assistant's final message had no tool_use. We are either completing the turn or triggering a recovery.

Prompt-Too-Long recovery (lines 1065-1183)

Detection: the last message is a withheld 413 error (isWithheld413).

Collapse drain first (lines 1089-1117): if contextCollapse is enabled and no prior collapse drain in this chain, call recoverFromOverflow(). If it commits collapses, yield those messages and continue with transition.reason = 'collapse_drain_retry' (line 1110).
Reactive compact second (lines 1119-1166): if tryReactiveCompact() succeeds, carry taskBudgetRemaining through the compact boundary, yield the boundary, and continue with reason = 'reactive_compact_retry' (line 1162).
Failure path (lines 1173-1175): surface the withheld error and return { reason: 'image_error' | 'prompt_too_long' }. Skip stop hooks here to prevent a death spiral where a hook retry hits the same PTL.

Max-output-tokens recovery (lines 1185-1256)

Detection: isWithheldMaxOutputTokens.

Escalate (lines 1195-1220): if no prior override and default 8k was capped, retry with maxOutputTokensOverride = ESCALATED_MAX_TOKENS (64k) and continue (line 1217) with reason = 'max_output_tokens_escalate'. No user-visible nudge.
Multi-turn (lines 1223-1251): if maxOutputTokensRecoveryCount < 3, inject a small "please continue" prompt, increment the counter, and continue (line 1246) with reason = 'max_output_tokens_recovery'.
Exhausted (line 1255): surface the withheld error.

API error (lines 1258-1264)

If the last message is another API error (rate limit, auth), skip stop hooks and return { reason: 'completed' }.

Stop hooks (lines 1267-1306)

const stopHookResult = yield* handleStopHooks(
  messagesForQuery, assistantMessages, systemPrompt, userContext, systemContext,
  toolUseContext, querySource, stopHookActive
)

handleStopHooks (see query/stopHooks.ts:82-295) does a lot:

Capture full context for downstream hooks.
Save cacheSafeParams snapshot (used by /btw later).
Job classification if TEMPLATES feature on.
Fire-and-forget: prompt suggestion, memory extraction (if main thread), auto-dream.
Chicago MCP cleanup.
Execute configured stop hooks — yields progress/attachment/error messages.
Teammate hooks (TaskCompleted per task + TeammateIdle) if this is a teammate session.

Return value: { blockingErrors, preventContinuation }.

If preventContinuation === true → return { reason: 'stop_hook_prevented' }.
If blockingErrors.length > 0 → inject the errors as user messages, set stopHookActive = true, continue (line 1302) with reason = 'stop_hook_blocking'.

Token budget continuation (lines 1308-1355)

Only if TOKEN_BUDGET feature on. checkTokenBudget() inspects budgetTracker state vs. current turn output:

Continue if turnTokens < 90% of budget AND (deltaSinceLastCheck >= 500 OR first continuation). Inject a nudge, continue (line 1338) with reason = 'token_budget_continuation'.
Stop if ≥90% used, or diminishing returns (3+ continuations with last two deltas < 500 tokens).

Normal completion (line 1357)

Return { reason: 'completed', turnCount }.

16.8 Phase 5b: tool use branch (lines 1359–1727)

Reached when needsFollowUp — the assistant emitted tool_use blocks.

Tool execution (lines 1363-1409)

// streaming mode
for await (const update of streamingToolExecutor.getRemainingResults()) { … }
// or serial
for await (const update of runTools(toolUseBlocks, canUseTool, toolUseContext)) { … }

Each update is either a message (yield + add to toolResults if user/attachment) or a context modifier (apply to updatedToolUseContext). Also tracks shouldPreventContinuation for hook_stopped_continuation attachments.

Tool-use summary generation (lines 1411-1482)

If summaries are enabled and this isn't a subagent, kick off the next turn's haiku summary:

nextPendingToolUseSummary = generateToolUseSummary(toolBlocks, toolResults, lastAssistantText)
  .catch(() => null)

Fire-and-forget. The promise is picked up next turn at line 1054.

Abort during tool execution (lines 1485-1516)

Distinct from the streaming-abort path. If aborted here:

Trigger Chicago MCP cleanup on main thread.
Yield interruption (unless 'interrupt' reason).
Check maxTurns before returning.
Return { reason: 'aborted_tools' }.

Hook-stopped continuation (lines 1518-1521)

If a hook attachment signaled "stop continuation," return { reason: 'hook_stopped' }.

Attachment collection (lines 1538-1643)

Before the next turn, inject whatever per-turn reminders are needed:

Queued commands drain (lines 1570-1578). Task notifications and queued user messages are snapshotted.
Attachment messages (lines 1580-1590). Memory, skill-discovery, file-change attachments.
Memory prefetch consumption (lines 1599-1614). If a background prefetch of relevant memories settled during the current turn, consume now.
Skill discovery (lines 1620-1628). Prefetched skills injected as <system-reminder> attachments.
Command dequeue (lines 1632-1643). Remove consumed commands from the queue.
File-change logging (lines 1646-1657).

Tool refresh (lines 1660-1671)

Re-fetch the tool list between turns in case a new MCP server connected mid-turn.

Max turns check (lines 1705-1712)

if (nextTurnCount > maxTurns) {
  yield attach({ type: 'max_turns_reached', maxTurns, turnCount: nextTurnCount })
  return { reason: 'max_turns', turnCount: nextTurnCount }
}

State reassignment (lines 1715-1727)

state = {
  ...state,
  messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
  turnCount: nextTurnCount,
  toolUseContext: updatedToolUseContext,
  pendingToolUseSummary: nextPendingToolUseSummary,
  maxOutputTokensRecoveryCount: 0,           // reset recovery counters
  hasAttemptedReactiveCompact: false,        // reset
  transition: { reason: 'next_turn' },
}
continue  // line 1727

16.9 `StreamingToolExecutor` — one-paragraph recap

See doc 13.2 for the detailed treatment. The executor maintains a queue of tools with statuses (queued | executing | completed | yielded), lets concurrent-safe tools run in parallel but serializes unsafe ones, isolates per-tool errors with per-tool AbortControllers (Bash errors are the exception — they cancel siblings), and yields results in submission order. The canExecuteTool(isConcurrencySafe) rule is the one-line summary: all-safe-if-current-is-safe, else queue.

16.10 `pendingToolUseSummary` — the haiku overlap

generateToolUseSummary() runs on Haiku (~1s) during the next turn's Opus stream (~5-30s). The summary is stored in state.pendingToolUseSummary at the end of turn N and yielded at the start of turn N+1 (lines 1054-1060). Net latency impact: zero — the Haiku call is overlapped.

16.11 `budgetTracker` and `taskBudgetRemaining`

budgetTracker (lines 280 + query/tokenBudget.ts):

{ continuationCount, lastDeltaTokens, lastGlobalTurnTokens, startedAt }

Used by checkTokenBudget() at line 1308-1355 for the auto-continuation decision.

taskBudgetRemaining (line 291):

Undefined until first compaction.
When a compact fires: preCompactContext = finalContextTokensFromLastResponse(messagesForQuery); taskBudgetRemaining = Math.max(0, (taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext).
Passed to the API as remaining: so the server's task_budget counter keeps decrementing correctly across compaction boundaries (line 702).

Without this, the server would reset the counter at each compact and overspend.

16.12 The seven `transition.reason` values — triggers and effects

Reason	Line	Trigger	State change on continue
`collapse_drain_retry`	1110	PTL + contextCollapse available + commits > 0	Replace messages with drained set
`reactive_compact_retry`	1162	PTL or media-size + reactiveCompact succeeded	Post-compact messages; carry taskBudget
`max_output_tokens_escalate`	1217	default 8k capped + no prior override	set `maxOutputTokensOverride = ESCALATED_MAX_TOKENS`
`max_output_tokens_recovery`	1246	`recoveryCount < 3` + withheld max-output-tokens	Inject "continue" prompt; increment counter
`stop_hook_blocking`	1302	`stopHookResult.blockingErrors.length > 0`	Inject errors; set `stopHookActive = true`
`token_budget_continuation`	1338	`TOKEN_BUDGET` + `checkTokenBudget === 'continue'`	Inject budget nudge
`next_turn`	1725	Normal post-tool-execution	Append assistant + toolResults; increment `turnCount`

Each reason corresponds to exactly one continue site. Tests can assert on transition.reason without reading message content — this is unusually testable for a streaming loop.

16.13 Message-ordering invariants

Thinking blocks (lines 151-163):
- Only present if max_thinking_length > 0.
- May not be the last block of a message.
- Must be preserved across the full assistant trajectory (turn + tool_use → tool_result → next assistant).
Tool_use / tool_result pairing:
- Every tool_use must have a matching tool_result before the next API call.
- Orphans are backfilled (lines 900-903, 1019) with synthetic results.
- Missing results → API 400.
Withheld vs. tombstoned:
- Withheld (lines 799-825): message pushed to assistantMessages for recovery logic, not yielded to SDK/transcript.
- Tombstoned (lines 716-741): message retained in array but marked; fallback pathway uses these.
Final array order (line 1716): messagesForQuery + assistantMessages + toolResults. Next turn sees the full conversation with this ordering.

16.14 `persistSession` (note)

query.ts itself does not call recordTranscript. Persistence is a caller concern — QueryEngine.submitMessage (see QueryEngine.ts:727-732) writes the transcript as messages flow out of query(), fire-and-forget for assistant messages to avoid blocking stream generator, awaited for user messages to guarantee resumability.

16.15 `deps` — the test seam

// query/deps.ts:21-31
type QueryDeps = {
  callModel: typeof queryModelWithStreaming
  microcompact: typeof microcompactMessages
  autocompact: typeof autoCompactIfNeeded
  uuid: () => string
}

Default: productionDeps() at line 33. Override via params.deps in tests. Narrow scope = no module spy boilerplate. Using typeof keeps types in sync automatically.

16.16 `maxTurns` — the hard cap

Set by the caller (SDK, headless, REPL). Checked at two sites: abort-during-tools (line 1507) and pre-continue (line 1705). When exceeded, yield max_turns_reached attachment and return { reason: 'max_turns', turnCount }. Prevents runaway agentic loops.

16.17 Walking a happy-path iteration

Turn N — assistant reads a file, writes a file.

Enter while loop; destructure state (line 311).
Emit stream_request_start; checkpoint.
Compaction check — under threshold, no-op.
Streaming executor created (gate on). Model resolved.
API stream begins (line 659). First event: text block. Yield.
Second event: tool_use(FileRead). Push to toolUseBlocks; executor starts Read.
Read completes while streaming continues (line 851 yields a tool_result).
Third event: tool_use(FileEdit). Push; executor queues (non-concurrent-safe, blocks on Read if still running — but Read finished).
Fourth event: end of stream.
Post-sampling hooks (line 1000). Abort? No.
Yield pending haiku summary from turn N-1 (line 1054).
needsFollowUp = true → enter Phase 5b.
Execute remaining tools (line 1380): FileEdit runs alone, result yielded.
Kick off next-turn haiku summary (line 1411). Fire-and-forget.
Collect attachments (line 1580). No queue drain.
Tool refresh (line 1660).
nextTurnCount = 2 < maxTurns → build next state (line 1716), transition.reason = 'next_turn', continue.

Turn N+1 — assistant summarizes and stops.

Enter, destructure, compact check, stream.
Stream yields one text block; no tool_use.
needsFollowUp = false → Phase 5a.
No PTL, no max-output. Call stop hooks.
Stop hooks fire-and-forget memory extraction.
Stop hooks return { blockingErrors: [], preventContinuation: false }.
Token budget check (if feature on) — no continuation.
Return { reason: 'completed', turnCount: 2 }.

16.18 Complexity hotspots

Withholding coordination (lines 799-825 ↔ 1070-1183). PTL / max-output errors are suppressed from SDK output but inspected for recovery. If withholding is missed, the user sees the error before recovery has a chance; if surfacing is missed, the user waits forever.
Compaction boundary task-budget carryover (lines 282-291, 508-515, 1138-1145). Without careful bookkeeping, the server's task_budget counter under-counts post-compact spend and clients see unpredictable cutoffs.
Streaming tool ordering (StreamingToolExecutor.ts:129-151). Concurrency rules for tools running in parallel; result order must still match submission order.
Dual abort paths (lines 1015-1051 vs. 1485-1516). Streaming-aborts need synthetic tool_results; tool-execution-aborts need maxTurns check. Different paths, similar-looking code.
Stop hook fire-and-forget (stopHooks.ts:136-157). Memory extraction is async after stop hooks return; shutdown must drain it (extractMemories.ts:611-615) or you lose the extraction to process exit.