16. The Agentic Loop — Schematic Walkthrough
This is a detailed, step-by-step of what actually happens inside /workspaces/src/query.ts:219's queryLoop() during one iteration. Skim doc 4 for the 30,000-foot view; come here when you need to know which line does what.
16.1 The loop at a glance
State := { messages, toolUseContext, turnCount, compactTracking, recoveryFlags, transition, pendingSummary }
while (true) {
┌─ Phase 1: Setup (lines 311–580)
│ destructure state, pre-query compaction, model selection
│
├─ Phase 2: Model stream (lines 653–863)
│ start API stream, consume events, emit assistant messages,
│ collect tool_use blocks, optionally start tools streaming
│
├─ Phase 3: Errors (lines 893–997)
│ FallbackTriggeredError → model switch, retry. Other errors → return.
│
├─ Phase 4: Post-sampling (lines 999–1060)
│ fire post-sampling hooks, handle abort during streaming,
│ yield pending haiku summary from prior turn.
│
├─ Phase 5a: No tool use (lines 1062–1358) ← terminal or recovery
│ PTL recovery → collapse drain / reactive compact
│ max-output recovery → escalate or multi-turn continue
│ stop hooks, token budget continuation,
│ else return `{ reason: 'completed' }`.
│
├─ Phase 5b: Tool use (lines 1359–1727) ← normal continuation
│ execute all tools (streaming or serial),
│ generate next-turn haiku summary (fire-and-forget),
│ check abort, collect attachments, drain queued commands,
│ refresh tools, check maxTurns, reassign state.
│
└─ continue
}
16.2 State shape
// query.ts:204-217
type State = {
messages: Message[]
toolUseContext: ToolUseContext
autoCompactTracking: AutoCompactTrackingState | undefined
maxOutputTokensRecoveryCount: number
hasAttemptedReactiveCompact: boolean
maxOutputTokensOverride: number | undefined
pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
stopHookActive: boolean | undefined
turnCount: number
transition: Continue | undefined
}Values reset on each continue; only toolUseContext can be mutated mid-iteration (to add query tracking at lines 360-363).
Initial state at line 268 sets turnCount: 1, transition: undefined. budgetTracker at line 280 (for TOKEN_BUDGET) and taskBudgetRemaining at line 291 are outside State — they persist across all iterations of one query() invocation, not just one turn.
16.3 Phase 1: setup (lines 311–580)
Inside each iteration:
- Destructure state (lines 311-321).
- Emit
stream_request_start(line 337). UI and SDK interpret this as "a new request is about to hit the API." - Profile checkpoint
query_fn_entry(line 339). - Query chain tracking (lines 347-363). Depth is incremented or a chainId is initialized; written into
toolUseContextso downstream tools can include it in analytics. This is the only placetoolUseContextis reassigned mid-iteration. - Message preparation (line 365):
getMessagesAfterLastCompactBoundary(messages)— the compaction boundary is preserved as a marker, and the model only sees messages after it. - Content replacement (lines 376-394): per-message tool-result budgeting. Large tool results are swapped for disk-persisted previews via
recordContentReplacement. - Snip (lines 401-410) if
HISTORY_SNIPfeature on — optional targeted message removal. - Microcompact (lines 413-426) if enabled — cache-aware, inline message-level compaction (not full summary).
- Context collapse (lines 440-447) if enabled — read-time projection of collapsed regions.
- Auto-compact check (lines 453-543): the big one. Calls
shouldAutoCompact()(see doc 4.6). If needed, runs a full compaction, yields the boundary message, capturespreCompactContextintotaskBudgetRemaining. - Streaming executor (lines 561-568): instantiate
StreamingToolExecutorif gate is on. - Model resolution, dump prompts, blocking-limit check (568-648).
If a blocking context-window limit is hit and no recovery path is available, return early with { reason: 'blocking_limit' } (lines 628-648).
16.4 Phase 2: the model stream (lines 653–863)
The outer fallback loop
// query.ts:654
while (attemptWithFallback) { … }Set to false initially; set to true by FallbackTriggeredError handling (line 897). Allows the same turn's request to retry with a different model.
The inner for-await
// query.ts:659
for await (const message of deps.callModel(/* params */)) { … }deps.callModel is queryModelWithStreaming by default (query/deps.ts:33-40), pluggable for tests. It is an async generator yielding one Message per content block as streaming progresses.
What each yielded message triggers
stream_event→ pass through to the caller viayield.assistantmessage:- Fallback tombstone check (lines 712-741): if a fallback is in progress, mark this message as tombstoned (kept in the array for recovery but not yielded).
- Backfill observable input (lines 747-787): apply each tool's
backfillObservableInputto a copy of the tool_use input so SDK observers / transcripts see legacy fields without mutating the API-bound shape. Comment: "The original API-bound input is never mutated (preserves prompt cache)." - Withholding (lines 799-825): if the message is a recoverable error (PTL, max-output-tokens, media-size), don't yield it yet — recovery phase may replace it.
- Tool-use collection (lines 826-844): filter content for
tool_useblocks, push totoolUseBlocks, setneedsFollowUp = true. - Streaming executor enqueue (line 842): if active, add each tool to the executor immediately.
- Completed streaming tool results (lines 847-862): yield
getCompletedResults()— any tools that finished while the stream was ongoing.
Cached microcompact boundary yield
Lines 870-892 yield the compact boundary message now that the actual cache_deleted_input_tokens delta is known (the accurate number is only available after the API response).
16.5 Phase 3: errors (lines 893–997)
FallbackTriggeredError (line 894)
- Tombstone orphaned assistant messages (line 717-741 and surrounding).
- Generate synthetic
tool_resultblocks for orphantool_uses: "Model fallback triggered" (line 900-903). Without this, the next API call would 400 on tool_use/tool_result pairing. - Clear the streaming executor (
query.ts:734-739) — orphaned results are dangerous. - Switch current model.
- Strip thinking signatures for non-matching fallback models (line 928) — thinking blocks are model-bound.
- Yield a system "model switched" message (lines 945-948).
attemptWithFallback = true+ continue the outer while (line 950).
Other API errors (lines 955-997)
Yield any missing tool_results synthetically, surface the error as a user-visible message, return { reason: 'model_error' }.
16.6 Phase 4: post-sampling (lines 999–1060)
- Post-sampling hooks (lines 1000-1008). Fire-and-forget; observation-only; no state change.
- Abort during streaming (lines 1015-1051). If the AbortController fired mid-stream:
- Consume any remaining StreamingToolExecutor results — the executor generates synthetic tool_results for aborted tools.
- Yield an "interrupted by user" message (skipped when
signal.reason === 'interrupt'because the queued user message provides context). - Return
{ reason: 'aborted_streaming' }.
- Yield pending haiku summary (lines 1054-1060).
state.pendingToolUseSummaryis a promise created in the previous turn (~1s Haiku call that ran alongside the ~5-30s main model call). Awaiting it here gives the UI a compact summary of the last tool batch just in time.
16.7 Phase 5a: no tool use (lines 1062–1358)
Reached when !needsFollowUp — the assistant's final message had no tool_use. We are either completing the turn or triggering a recovery.
Prompt-Too-Long recovery (lines 1065-1183)
Detection: the last message is a withheld 413 error (isWithheld413).
- Collapse drain first (lines 1089-1117): if
contextCollapseis enabled and no prior collapse drain in this chain, callrecoverFromOverflow(). If it commits collapses, yield those messages and continue withtransition.reason = 'collapse_drain_retry'(line 1110). - Reactive compact second (lines 1119-1166): if
tryReactiveCompact()succeeds, carrytaskBudgetRemainingthrough the compact boundary, yield the boundary, and continue withreason = 'reactive_compact_retry'(line 1162). - Failure path (lines 1173-1175): surface the withheld error and return
{ reason: 'image_error' | 'prompt_too_long' }. Skip stop hooks here to prevent a death spiral where a hook retry hits the same PTL.
Max-output-tokens recovery (lines 1185-1256)
Detection: isWithheldMaxOutputTokens.
- Escalate (lines 1195-1220): if no prior override and default 8k was capped, retry with
maxOutputTokensOverride = ESCALATED_MAX_TOKENS (64k)and continue (line 1217) withreason = 'max_output_tokens_escalate'. No user-visible nudge. - Multi-turn (lines 1223-1251): if
maxOutputTokensRecoveryCount < 3, inject a small "please continue" prompt, increment the counter, and continue (line 1246) withreason = 'max_output_tokens_recovery'. - Exhausted (line 1255): surface the withheld error.
API error (lines 1258-1264)
If the last message is another API error (rate limit, auth), skip stop hooks and return { reason: 'completed' }.
Stop hooks (lines 1267-1306)
const stopHookResult = yield* handleStopHooks(
messagesForQuery, assistantMessages, systemPrompt, userContext, systemContext,
toolUseContext, querySource, stopHookActive
)handleStopHooks (see query/stopHooks.ts:82-295) does a lot:
- Capture full context for downstream hooks.
- Save
cacheSafeParamssnapshot (used by/btwlater). - Job classification if
TEMPLATESfeature on. - Fire-and-forget: prompt suggestion, memory extraction (if main thread), auto-dream.
- Chicago MCP cleanup.
- Execute configured stop hooks — yields progress/attachment/error messages.
- Teammate hooks (
TaskCompletedper task +TeammateIdle) if this is a teammate session.
Return value: { blockingErrors, preventContinuation }.
- If
preventContinuation === true→ return{ reason: 'stop_hook_prevented' }. - If
blockingErrors.length > 0→ inject the errors as user messages, setstopHookActive = true, continue (line 1302) withreason = 'stop_hook_blocking'.
Token budget continuation (lines 1308-1355)
Only if TOKEN_BUDGET feature on. checkTokenBudget() inspects budgetTracker state vs. current turn output:
- Continue if
turnTokens < 90% of budgetAND (deltaSinceLastCheck >= 500OR first continuation). Inject a nudge, continue (line 1338) withreason = 'token_budget_continuation'. - Stop if ≥90% used, or diminishing returns (3+ continuations with last two deltas < 500 tokens).
Normal completion (line 1357)
Return { reason: 'completed', turnCount }.
16.8 Phase 5b: tool use branch (lines 1359–1727)
Reached when needsFollowUp — the assistant emitted tool_use blocks.
Tool execution (lines 1363-1409)
// streaming mode
for await (const update of streamingToolExecutor.getRemainingResults()) { … }
// or serial
for await (const update of runTools(toolUseBlocks, canUseTool, toolUseContext)) { … }Each update is either a message (yield + add to toolResults if user/attachment) or a context modifier (apply to updatedToolUseContext). Also tracks shouldPreventContinuation for hook_stopped_continuation attachments.
Tool-use summary generation (lines 1411-1482)
If summaries are enabled and this isn't a subagent, kick off the next turn's haiku summary:
nextPendingToolUseSummary = generateToolUseSummary(toolBlocks, toolResults, lastAssistantText)
.catch(() => null)Fire-and-forget. The promise is picked up next turn at line 1054.
Abort during tool execution (lines 1485-1516)
Distinct from the streaming-abort path. If aborted here:
- Trigger Chicago MCP cleanup on main thread.
- Yield interruption (unless
'interrupt'reason). - Check maxTurns before returning.
- Return
{ reason: 'aborted_tools' }.
Hook-stopped continuation (lines 1518-1521)
If a hook attachment signaled "stop continuation," return { reason: 'hook_stopped' }.
Attachment collection (lines 1538-1643)
Before the next turn, inject whatever per-turn reminders are needed:
- Queued commands drain (lines 1570-1578). Task notifications and queued user messages are snapshotted.
- Attachment messages (lines 1580-1590). Memory, skill-discovery, file-change attachments.
- Memory prefetch consumption (lines 1599-1614). If a background prefetch of relevant memories settled during the current turn, consume now.
- Skill discovery (lines 1620-1628). Prefetched skills injected as
<system-reminder>attachments. - Command dequeue (lines 1632-1643). Remove consumed commands from the queue.
- File-change logging (lines 1646-1657).
Tool refresh (lines 1660-1671)
Re-fetch the tool list between turns in case a new MCP server connected mid-turn.
Max turns check (lines 1705-1712)
if (nextTurnCount > maxTurns) {
yield attach({ type: 'max_turns_reached', maxTurns, turnCount: nextTurnCount })
return { reason: 'max_turns', turnCount: nextTurnCount }
}State reassignment (lines 1715-1727)
state = {
...state,
messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
turnCount: nextTurnCount,
toolUseContext: updatedToolUseContext,
pendingToolUseSummary: nextPendingToolUseSummary,
maxOutputTokensRecoveryCount: 0, // reset recovery counters
hasAttemptedReactiveCompact: false, // reset
transition: { reason: 'next_turn' },
}
continue // line 172716.9 StreamingToolExecutor — one-paragraph recap
See doc 13.2 for the detailed treatment. The executor maintains a queue of tools with statuses (queued | executing | completed | yielded), lets concurrent-safe tools run in parallel but serializes unsafe ones, isolates per-tool errors with per-tool AbortControllers (Bash errors are the exception — they cancel siblings), and yields results in submission order. The canExecuteTool(isConcurrencySafe) rule is the one-line summary: all-safe-if-current-is-safe, else queue.
16.10 pendingToolUseSummary — the haiku overlap
generateToolUseSummary() runs on Haiku (~1s) during the next turn's Opus stream (~5-30s). The summary is stored in state.pendingToolUseSummary at the end of turn N and yielded at the start of turn N+1 (lines 1054-1060). Net latency impact: zero — the Haiku call is overlapped.
16.11 budgetTracker and taskBudgetRemaining
budgetTracker (lines 280 + query/tokenBudget.ts):
{ continuationCount, lastDeltaTokens, lastGlobalTurnTokens, startedAt }Used by checkTokenBudget() at line 1308-1355 for the auto-continuation decision.
taskBudgetRemaining (line 291):
- Undefined until first compaction.
- When a compact fires:
preCompactContext = finalContextTokensFromLastResponse(messagesForQuery);taskBudgetRemaining = Math.max(0, (taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext). - Passed to the API as
remaining:so the server's task_budget counter keeps decrementing correctly across compaction boundaries (line 702).
Without this, the server would reset the counter at each compact and overspend.
16.12 The seven transition.reason values — triggers and effects
| Reason | Line | Trigger | State change on continue |
|---|---|---|---|
collapse_drain_retry |
1110 | PTL + contextCollapse available + commits > 0 | Replace messages with drained set |
reactive_compact_retry |
1162 | PTL or media-size + reactiveCompact succeeded | Post-compact messages; carry taskBudget |
max_output_tokens_escalate |
1217 | default 8k capped + no prior override | set maxOutputTokensOverride = ESCALATED_MAX_TOKENS |
max_output_tokens_recovery |
1246 | recoveryCount < 3 + withheld max-output-tokens |
Inject "continue" prompt; increment counter |
stop_hook_blocking |
1302 | stopHookResult.blockingErrors.length > 0 |
Inject errors; set stopHookActive = true |
token_budget_continuation |
1338 | TOKEN_BUDGET + checkTokenBudget === 'continue' |
Inject budget nudge |
next_turn |
1725 | Normal post-tool-execution | Append assistant + toolResults; increment turnCount |
Each reason corresponds to exactly one continue site. Tests can assert on transition.reason without reading message content — this is unusually testable for a streaming loop.
16.13 Message-ordering invariants
- Thinking blocks (lines 151-163):
- Only present if
max_thinking_length > 0. - May not be the last block of a message.
- Must be preserved across the full assistant trajectory (turn + tool_use → tool_result → next assistant).
- Only present if
- Tool_use / tool_result pairing:
- Every
tool_usemust have a matchingtool_resultbefore the next API call. - Orphans are backfilled (lines 900-903, 1019) with synthetic results.
- Missing results → API 400.
- Every
- Withheld vs. tombstoned:
- Withheld (lines 799-825): message pushed to
assistantMessagesfor recovery logic, not yielded to SDK/transcript. - Tombstoned (lines 716-741): message retained in array but marked; fallback pathway uses these.
- Withheld (lines 799-825): message pushed to
- Final array order (line 1716):
messagesForQuery + assistantMessages + toolResults. Next turn sees the full conversation with this ordering.
16.14 persistSession (note)
query.ts itself does not call recordTranscript. Persistence is a caller concern — QueryEngine.submitMessage (see QueryEngine.ts:727-732) writes the transcript as messages flow out of query(), fire-and-forget for assistant messages to avoid blocking stream generator, awaited for user messages to guarantee resumability.
16.15 deps — the test seam
// query/deps.ts:21-31
type QueryDeps = {
callModel: typeof queryModelWithStreaming
microcompact: typeof microcompactMessages
autocompact: typeof autoCompactIfNeeded
uuid: () => string
}Default: productionDeps() at line 33. Override via params.deps in tests. Narrow scope = no module spy boilerplate. Using typeof keeps types in sync automatically.
16.16 maxTurns — the hard cap
Set by the caller (SDK, headless, REPL). Checked at two sites: abort-during-tools (line 1507) and pre-continue (line 1705). When exceeded, yield max_turns_reached attachment and return { reason: 'max_turns', turnCount }. Prevents runaway agentic loops.
16.17 Walking a happy-path iteration
Turn N — assistant reads a file, writes a file.
- Enter while loop; destructure state (line 311).
- Emit
stream_request_start; checkpoint. - Compaction check — under threshold, no-op.
- Streaming executor created (gate on). Model resolved.
- API stream begins (line 659). First event: text block. Yield.
- Second event:
tool_use(FileRead). Push totoolUseBlocks; executor starts Read. - Read completes while streaming continues (line 851 yields a
tool_result). - Third event:
tool_use(FileEdit). Push; executor queues (non-concurrent-safe, blocks on Read if still running — but Read finished). - Fourth event: end of stream.
- Post-sampling hooks (line 1000). Abort? No.
- Yield pending haiku summary from turn N-1 (line 1054).
needsFollowUp = true→ enter Phase 5b.- Execute remaining tools (line 1380): FileEdit runs alone, result yielded.
- Kick off next-turn haiku summary (line 1411). Fire-and-forget.
- Collect attachments (line 1580). No queue drain.
- Tool refresh (line 1660).
nextTurnCount = 2< maxTurns → build next state (line 1716),transition.reason = 'next_turn', continue.
Turn N+1 — assistant summarizes and stops.
- Enter, destructure, compact check, stream.
- Stream yields one text block; no
tool_use. needsFollowUp = false→ Phase 5a.- No PTL, no max-output. Call stop hooks.
- Stop hooks fire-and-forget memory extraction.
- Stop hooks return
{ blockingErrors: [], preventContinuation: false }. - Token budget check (if feature on) — no continuation.
- Return
{ reason: 'completed', turnCount: 2 }.
16.18 Complexity hotspots
- Withholding coordination (lines 799-825 ↔ 1070-1183). PTL / max-output errors are suppressed from SDK output but inspected for recovery. If withholding is missed, the user sees the error before recovery has a chance; if surfacing is missed, the user waits forever.
- Compaction boundary task-budget carryover (lines 282-291, 508-515, 1138-1145). Without careful bookkeeping, the server's task_budget counter under-counts post-compact spend and clients see unpredictable cutoffs.
- Streaming tool ordering (
StreamingToolExecutor.ts:129-151). Concurrency rules for tools running in parallel; result order must still match submission order. - Dual abort paths (lines 1015-1051 vs. 1485-1516). Streaming-aborts need synthetic tool_results; tool-execution-aborts need maxTurns check. Different paths, similar-looking code.
- Stop hook fire-and-forget (
stopHooks.ts:136-157). Memory extraction is async after stop hooks return; shutdown must drain it (extractMemories.ts:611-615) or you lose the extraction to process exit.