CodeDocs Vault

16. The Agentic Loop — Schematic Walkthrough

This is a detailed, step-by-step of what actually happens inside /workspaces/src/query.ts:219's queryLoop() during one iteration. Skim doc 4 for the 30,000-foot view; come here when you need to know which line does what.

16.1 The loop at a glance

State := { messages, toolUseContext, turnCount, compactTracking, recoveryFlags, transition, pendingSummary }

while (true) {
  ┌─ Phase 1: Setup        (lines 311–580)
  │    destructure state, pre-query compaction, model selection
  │
  ├─ Phase 2: Model stream (lines 653–863)
  │    start API stream, consume events, emit assistant messages,
  │    collect tool_use blocks, optionally start tools streaming
  │
  ├─ Phase 3: Errors       (lines 893–997)
  │    FallbackTriggeredError → model switch, retry. Other errors → return.
  │
  ├─ Phase 4: Post-sampling (lines 999–1060)
  │    fire post-sampling hooks, handle abort during streaming,
  │    yield pending haiku summary from prior turn.
  │
  ├─ Phase 5a: No tool use (lines 1062–1358)    ← terminal or recovery
  │    PTL recovery → collapse drain / reactive compact
  │    max-output recovery → escalate or multi-turn continue
  │    stop hooks, token budget continuation,
  │    else return `{ reason: 'completed' }`.
  │
  ├─ Phase 5b: Tool use    (lines 1359–1727)    ← normal continuation
  │    execute all tools (streaming or serial),
  │    generate next-turn haiku summary (fire-and-forget),
  │    check abort, collect attachments, drain queued commands,
  │    refresh tools, check maxTurns, reassign state.
  │
  └─ continue
}

16.2 State shape

// query.ts:204-217
type State = {
  messages: Message[]
  toolUseContext: ToolUseContext
  autoCompactTracking: AutoCompactTrackingState | undefined
  maxOutputTokensRecoveryCount: number
  hasAttemptedReactiveCompact: boolean
  maxOutputTokensOverride: number | undefined
  pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
  stopHookActive: boolean | undefined
  turnCount: number
  transition: Continue | undefined
}

Values reset on each continue; only toolUseContext can be mutated mid-iteration (to add query tracking at lines 360-363).

Initial state at line 268 sets turnCount: 1, transition: undefined. budgetTracker at line 280 (for TOKEN_BUDGET) and taskBudgetRemaining at line 291 are outside State — they persist across all iterations of one query() invocation, not just one turn.

16.3 Phase 1: setup (lines 311–580)

Inside each iteration:

  1. Destructure state (lines 311-321).
  2. Emit stream_request_start (line 337). UI and SDK interpret this as "a new request is about to hit the API."
  3. Profile checkpoint query_fn_entry (line 339).
  4. Query chain tracking (lines 347-363). Depth is incremented or a chainId is initialized; written into toolUseContext so downstream tools can include it in analytics. This is the only place toolUseContext is reassigned mid-iteration.
  5. Message preparation (line 365): getMessagesAfterLastCompactBoundary(messages) — the compaction boundary is preserved as a marker, and the model only sees messages after it.
  6. Content replacement (lines 376-394): per-message tool-result budgeting. Large tool results are swapped for disk-persisted previews via recordContentReplacement.
  7. Snip (lines 401-410) if HISTORY_SNIP feature on — optional targeted message removal.
  8. Microcompact (lines 413-426) if enabled — cache-aware, inline message-level compaction (not full summary).
  9. Context collapse (lines 440-447) if enabled — read-time projection of collapsed regions.
  10. Auto-compact check (lines 453-543): the big one. Calls shouldAutoCompact() (see doc 4.6). If needed, runs a full compaction, yields the boundary message, captures preCompactContext into taskBudgetRemaining.
  11. Streaming executor (lines 561-568): instantiate StreamingToolExecutor if gate is on.
  12. Model resolution, dump prompts, blocking-limit check (568-648).

If a blocking context-window limit is hit and no recovery path is available, return early with { reason: 'blocking_limit' } (lines 628-648).

16.4 Phase 2: the model stream (lines 653–863)

The outer fallback loop

// query.ts:654
while (attemptWithFallback) { … }

Set to false initially; set to true by FallbackTriggeredError handling (line 897). Allows the same turn's request to retry with a different model.

The inner for-await

// query.ts:659
for await (const message of deps.callModel(/* params */)) { … }

deps.callModel is queryModelWithStreaming by default (query/deps.ts:33-40), pluggable for tests. It is an async generator yielding one Message per content block as streaming progresses.

What each yielded message triggers

Cached microcompact boundary yield

Lines 870-892 yield the compact boundary message now that the actual cache_deleted_input_tokens delta is known (the accurate number is only available after the API response).

16.5 Phase 3: errors (lines 893–997)

FallbackTriggeredError (line 894)

  1. Tombstone orphaned assistant messages (line 717-741 and surrounding).
  2. Generate synthetic tool_result blocks for orphan tool_uses: "Model fallback triggered" (line 900-903). Without this, the next API call would 400 on tool_use/tool_result pairing.
  3. Clear the streaming executor (query.ts:734-739) — orphaned results are dangerous.
  4. Switch current model.
  5. Strip thinking signatures for non-matching fallback models (line 928) — thinking blocks are model-bound.
  6. Yield a system "model switched" message (lines 945-948).
  7. attemptWithFallback = true + continue the outer while (line 950).

Other API errors (lines 955-997)

Yield any missing tool_results synthetically, surface the error as a user-visible message, return { reason: 'model_error' }.

16.6 Phase 4: post-sampling (lines 999–1060)

  1. Post-sampling hooks (lines 1000-1008). Fire-and-forget; observation-only; no state change.
  2. Abort during streaming (lines 1015-1051). If the AbortController fired mid-stream:
    • Consume any remaining StreamingToolExecutor results — the executor generates synthetic tool_results for aborted tools.
    • Yield an "interrupted by user" message (skipped when signal.reason === 'interrupt' because the queued user message provides context).
    • Return { reason: 'aborted_streaming' }.
  3. Yield pending haiku summary (lines 1054-1060). state.pendingToolUseSummary is a promise created in the previous turn (~1s Haiku call that ran alongside the ~5-30s main model call). Awaiting it here gives the UI a compact summary of the last tool batch just in time.

16.7 Phase 5a: no tool use (lines 1062–1358)

Reached when !needsFollowUp — the assistant's final message had no tool_use. We are either completing the turn or triggering a recovery.

Prompt-Too-Long recovery (lines 1065-1183)

Detection: the last message is a withheld 413 error (isWithheld413).

Max-output-tokens recovery (lines 1185-1256)

Detection: isWithheldMaxOutputTokens.

API error (lines 1258-1264)

If the last message is another API error (rate limit, auth), skip stop hooks and return { reason: 'completed' }.

Stop hooks (lines 1267-1306)

const stopHookResult = yield* handleStopHooks(
  messagesForQuery, assistantMessages, systemPrompt, userContext, systemContext,
  toolUseContext, querySource, stopHookActive
)

handleStopHooks (see query/stopHooks.ts:82-295) does a lot:

  1. Capture full context for downstream hooks.
  2. Save cacheSafeParams snapshot (used by /btw later).
  3. Job classification if TEMPLATES feature on.
  4. Fire-and-forget: prompt suggestion, memory extraction (if main thread), auto-dream.
  5. Chicago MCP cleanup.
  6. Execute configured stop hooks — yields progress/attachment/error messages.
  7. Teammate hooks (TaskCompleted per task + TeammateIdle) if this is a teammate session.

Return value: { blockingErrors, preventContinuation }.

Token budget continuation (lines 1308-1355)

Only if TOKEN_BUDGET feature on. checkTokenBudget() inspects budgetTracker state vs. current turn output:

Normal completion (line 1357)

Return { reason: 'completed', turnCount }.

16.8 Phase 5b: tool use branch (lines 1359–1727)

Reached when needsFollowUp — the assistant emitted tool_use blocks.

Tool execution (lines 1363-1409)

// streaming mode
for await (const update of streamingToolExecutor.getRemainingResults()) { … }
// or serial
for await (const update of runTools(toolUseBlocks, canUseTool, toolUseContext)) { … }

Each update is either a message (yield + add to toolResults if user/attachment) or a context modifier (apply to updatedToolUseContext). Also tracks shouldPreventContinuation for hook_stopped_continuation attachments.

Tool-use summary generation (lines 1411-1482)

If summaries are enabled and this isn't a subagent, kick off the next turn's haiku summary:

nextPendingToolUseSummary = generateToolUseSummary(toolBlocks, toolResults, lastAssistantText)
  .catch(() => null)

Fire-and-forget. The promise is picked up next turn at line 1054.

Abort during tool execution (lines 1485-1516)

Distinct from the streaming-abort path. If aborted here:

Hook-stopped continuation (lines 1518-1521)

If a hook attachment signaled "stop continuation," return { reason: 'hook_stopped' }.

Attachment collection (lines 1538-1643)

Before the next turn, inject whatever per-turn reminders are needed:

  1. Queued commands drain (lines 1570-1578). Task notifications and queued user messages are snapshotted.
  2. Attachment messages (lines 1580-1590). Memory, skill-discovery, file-change attachments.
  3. Memory prefetch consumption (lines 1599-1614). If a background prefetch of relevant memories settled during the current turn, consume now.
  4. Skill discovery (lines 1620-1628). Prefetched skills injected as <system-reminder> attachments.
  5. Command dequeue (lines 1632-1643). Remove consumed commands from the queue.
  6. File-change logging (lines 1646-1657).

Tool refresh (lines 1660-1671)

Re-fetch the tool list between turns in case a new MCP server connected mid-turn.

Max turns check (lines 1705-1712)

if (nextTurnCount > maxTurns) {
  yield attach({ type: 'max_turns_reached', maxTurns, turnCount: nextTurnCount })
  return { reason: 'max_turns', turnCount: nextTurnCount }
}

State reassignment (lines 1715-1727)

state = {
  ...state,
  messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
  turnCount: nextTurnCount,
  toolUseContext: updatedToolUseContext,
  pendingToolUseSummary: nextPendingToolUseSummary,
  maxOutputTokensRecoveryCount: 0,           // reset recovery counters
  hasAttemptedReactiveCompact: false,        // reset
  transition: { reason: 'next_turn' },
}
continue  // line 1727

16.9 StreamingToolExecutor — one-paragraph recap

See doc 13.2 for the detailed treatment. The executor maintains a queue of tools with statuses (queued | executing | completed | yielded), lets concurrent-safe tools run in parallel but serializes unsafe ones, isolates per-tool errors with per-tool AbortControllers (Bash errors are the exception — they cancel siblings), and yields results in submission order. The canExecuteTool(isConcurrencySafe) rule is the one-line summary: all-safe-if-current-is-safe, else queue.

16.10 pendingToolUseSummary — the haiku overlap

generateToolUseSummary() runs on Haiku (~1s) during the next turn's Opus stream (~5-30s). The summary is stored in state.pendingToolUseSummary at the end of turn N and yielded at the start of turn N+1 (lines 1054-1060). Net latency impact: zero — the Haiku call is overlapped.

16.11 budgetTracker and taskBudgetRemaining

budgetTracker (lines 280 + query/tokenBudget.ts):

{ continuationCount, lastDeltaTokens, lastGlobalTurnTokens, startedAt }

Used by checkTokenBudget() at line 1308-1355 for the auto-continuation decision.

taskBudgetRemaining (line 291):

Without this, the server would reset the counter at each compact and overspend.

16.12 The seven transition.reason values — triggers and effects

Reason Line Trigger State change on continue
collapse_drain_retry 1110 PTL + contextCollapse available + commits > 0 Replace messages with drained set
reactive_compact_retry 1162 PTL or media-size + reactiveCompact succeeded Post-compact messages; carry taskBudget
max_output_tokens_escalate 1217 default 8k capped + no prior override set maxOutputTokensOverride = ESCALATED_MAX_TOKENS
max_output_tokens_recovery 1246 recoveryCount < 3 + withheld max-output-tokens Inject "continue" prompt; increment counter
stop_hook_blocking 1302 stopHookResult.blockingErrors.length > 0 Inject errors; set stopHookActive = true
token_budget_continuation 1338 TOKEN_BUDGET + checkTokenBudget === 'continue' Inject budget nudge
next_turn 1725 Normal post-tool-execution Append assistant + toolResults; increment turnCount

Each reason corresponds to exactly one continue site. Tests can assert on transition.reason without reading message content — this is unusually testable for a streaming loop.

16.13 Message-ordering invariants

  1. Thinking blocks (lines 151-163):
    • Only present if max_thinking_length > 0.
    • May not be the last block of a message.
    • Must be preserved across the full assistant trajectory (turn + tool_use → tool_result → next assistant).
  2. Tool_use / tool_result pairing:
    • Every tool_use must have a matching tool_result before the next API call.
    • Orphans are backfilled (lines 900-903, 1019) with synthetic results.
    • Missing results → API 400.
  3. Withheld vs. tombstoned:
    • Withheld (lines 799-825): message pushed to assistantMessages for recovery logic, not yielded to SDK/transcript.
    • Tombstoned (lines 716-741): message retained in array but marked; fallback pathway uses these.
  4. Final array order (line 1716): messagesForQuery + assistantMessages + toolResults. Next turn sees the full conversation with this ordering.

16.14 persistSession (note)

query.ts itself does not call recordTranscript. Persistence is a caller concern — QueryEngine.submitMessage (see QueryEngine.ts:727-732) writes the transcript as messages flow out of query(), fire-and-forget for assistant messages to avoid blocking stream generator, awaited for user messages to guarantee resumability.

16.15 deps — the test seam

// query/deps.ts:21-31
type QueryDeps = {
  callModel: typeof queryModelWithStreaming
  microcompact: typeof microcompactMessages
  autocompact: typeof autoCompactIfNeeded
  uuid: () => string
}

Default: productionDeps() at line 33. Override via params.deps in tests. Narrow scope = no module spy boilerplate. Using typeof keeps types in sync automatically.

16.16 maxTurns — the hard cap

Set by the caller (SDK, headless, REPL). Checked at two sites: abort-during-tools (line 1507) and pre-continue (line 1705). When exceeded, yield max_turns_reached attachment and return { reason: 'max_turns', turnCount }. Prevents runaway agentic loops.

16.17 Walking a happy-path iteration

Turn N — assistant reads a file, writes a file.

  1. Enter while loop; destructure state (line 311).
  2. Emit stream_request_start; checkpoint.
  3. Compaction check — under threshold, no-op.
  4. Streaming executor created (gate on). Model resolved.
  5. API stream begins (line 659). First event: text block. Yield.
  6. Second event: tool_use(FileRead). Push to toolUseBlocks; executor starts Read.
  7. Read completes while streaming continues (line 851 yields a tool_result).
  8. Third event: tool_use(FileEdit). Push; executor queues (non-concurrent-safe, blocks on Read if still running — but Read finished).
  9. Fourth event: end of stream.
  10. Post-sampling hooks (line 1000). Abort? No.
  11. Yield pending haiku summary from turn N-1 (line 1054).
  12. needsFollowUp = true → enter Phase 5b.
  13. Execute remaining tools (line 1380): FileEdit runs alone, result yielded.
  14. Kick off next-turn haiku summary (line 1411). Fire-and-forget.
  15. Collect attachments (line 1580). No queue drain.
  16. Tool refresh (line 1660).
  17. nextTurnCount = 2 < maxTurns → build next state (line 1716), transition.reason = 'next_turn', continue.

Turn N+1 — assistant summarizes and stops.

  1. Enter, destructure, compact check, stream.
  2. Stream yields one text block; no tool_use.
  3. needsFollowUp = false → Phase 5a.
  4. No PTL, no max-output. Call stop hooks.
  5. Stop hooks fire-and-forget memory extraction.
  6. Stop hooks return { blockingErrors: [], preventContinuation: false }.
  7. Token budget check (if feature on) — no continuation.
  8. Return { reason: 'completed', turnCount: 2 }.

16.18 Complexity hotspots

  1. Withholding coordination (lines 799-825 ↔ 1070-1183). PTL / max-output errors are suppressed from SDK output but inspected for recovery. If withholding is missed, the user sees the error before recovery has a chance; if surfacing is missed, the user waits forever.
  2. Compaction boundary task-budget carryover (lines 282-291, 508-515, 1138-1145). Without careful bookkeeping, the server's task_budget counter under-counts post-compact spend and clients see unpredictable cutoffs.
  3. Streaming tool ordering (StreamingToolExecutor.ts:129-151). Concurrency rules for tools running in parallel; result order must still match submission order.
  4. Dual abort paths (lines 1015-1051 vs. 1485-1516). Streaming-aborts need synthetic tool_results; tool-execution-aborts need maxTurns check. Different paths, similar-looking code.
  5. Stop hook fire-and-forget (stopHooks.ts:136-157). Memory extraction is async after stop hooks return; shutdown must drain it (extractMemories.ts:611-615) or you lose the extraction to process exit.