CodeDocs Vault

03 - Agent Core Deep Dive

The Agent Loop (agent/core/agent_loop.py)

This is the heart of the system -- a 1,198-line file implementing the think-act cycle.

Entry Point: submission_loop() (line 1128)

The outermost function. Creates a Session, opens the ToolRouter (which initializes MCP connections), emits a ready event, then loops forever reading from the submission queue and dispatching to handlers.

# Simplified flow (agent_loop.py:1128-1198)
async def submission_loop(submission_queue, event_queue, config, ...):
    session = Session(config, event_queue, ...)
    async with session.tool_router:
        session.send_event("ready", {"tools": tool_count})
        retry_failed_uploads()  # Resume any failed session uploads
 
        while True:
            submission = await submission_queue.get()
            await process_submission(session, submission)

On unclean exit, an emergency save fires (line 1188-1198) to persist the session trajectory.

Core Loop: Handlers.run_agent() (line 436)

This is where the magic happens. The method implements a bounded while-loop (max 300 iterations by default) that:

  1. Checks cancellation (line 469)
  2. Compacts context if needed (line 472) -- triggers LLM-based summarization when near the limit
  3. Checks for doom loops (line 476) -- injects corrective prompts if repetitive patterns detected
  4. Calls the LLM (line 489) -- streaming or non-streaming via litellm
  5. Handles truncation (line 512-551) -- if finish_reason == "length", drops tool calls and injects a hint to use smaller arguments
  6. Parses tool calls (line 598-609) -- splits into valid (parseable JSON args) vs malformed
  7. Checks approvals (line 648-654) -- separates tools needing user consent
  8. Executes tools in parallel (line 683-748) -- asyncio.gather with per-task cancellation
  9. Handles approval flow (line 751-782) -- emits event and pauses if any tools need approval
  10. Loops or completes (line 818) -- if tools were called, iterate; if text-only response, done

Streaming LLM Calls: _call_llm_streaming() (line 245)

Accumulates content and tool call deltas from SSE chunks. Tool calls arrive as incremental fragments:

# Simplified (agent_loop.py:303-317)
for chunk in response:
    for delta_tc in chunk.choices[0].delta.tool_calls:
        idx = delta_tc.index
        if idx not in tool_calls:
            tool_calls[idx] = {"id": "", "name": "", "arguments": ""}
        if delta_tc.id:
            tool_calls[idx]["id"] += delta_tc.id
        if delta_tc.function.name:
            tool_calls[idx]["name"] += delta_tc.function.name
        if delta_tc.function.arguments:
            tool_calls[idx]["arguments"] += delta_tc.function.arguments

Each content chunk emits an assistant_chunk event for real-time display.

Parallel Tool Execution (line 683-748)

Tools execute concurrently via asyncio.gather. Each tool call is wrapped in a cancellation-aware wrapper:

# Simplified (agent_loop.py:683-748)
async def _exec_one(tc):
    if session.is_cancelled:
        return "Cancelled"
    session.send_event("tool_call", {name, args_preview})
    output, success = await session.tool_router.call_tool(name, args)
    session.send_event("tool_output", {output, success})
    return output
 
results = await asyncio.gather(*[_exec_one(tc) for tc in tools])

Approval System: _needs_approval() (line 48)

A policy function that determines which tool calls require explicit user consent:

Tool Condition Rationale
sandbox_create Always Creates a billable HF Space
hf_jobs with run/scheduled run GPU: always; CPU: configurable GPU jobs cost money
hf_repo_files with upload/delete Always Modifies shared repos
hf_repo_git with destructive ops Always delete_branch, merge_pr, create_repo, etc.

YOLO mode (line 53) bypasses all approvals. Enabled via /yolo command or --yolo CLI flag.

Error Recovery

Transient error retries (line 118-136): 3 attempts with delays [5s, 15s, 30s] for timeouts, rate limits, 5xx errors, connection errors. Pattern-matched via string matching on error messages in _is_transient_error() (line 123).

Context window exceeded (line 786-800): Catches ContextWindowExceededError, forces context_length above the limit to trigger compaction, then retries the same turn.

Malformed JSON tool args (line 598-641): When the LLM produces invalid JSON in tool arguments, the error is returned as a tool result so the LLM can self-correct.

Friendly error messages (line 139): Maps known error patterns to actionable user guidance:

Cancellation Flow: _cleanup_on_cancel() (line 210)

On Ctrl+C:

  1. Kills all sandbox processes (if sandbox exists)
  2. Cancels any running HF jobs (tracked via session._running_job_ids)

Double Ctrl+C within 1 second quits the application entirely (implemented in main.py:1005).


Context Manager (agent/context_manager/manager.py)

Purpose

Manages the conversation's message history, tracks token usage, handles compaction (summarization to fit within the context window), and repairs protocol violations.

System Prompt Loading (line 96)

Loads system_prompt_v3.yaml, renders with Jinja2 (passing tools and num_tools), then appends runtime context:

# Simplified (manager.py:127-147)
if local_mode:
    system_content += "\n\nCLI MODE: No sandbox available..."
    system_content += f"\nWorking directory: {os.getcwd()}"
 
system_content += f"\nToday: {date}, Time: {time} ({tz})"
system_content += f"\nUser: {hf_username}"
system_content += f"\nTools available: {num_tools}"

Dangling Tool Call Repair: _patch_dangling_tool_calls() (line 185)

LLM APIs require every tool call to have a matching tool result message. When a turn is interrupted mid-execution, some tool calls may lack results. This method scans backwards from the end of the conversation, finds unmatched tool call IDs, and injects stub results:

"Tool was not executed (interrupted or error)."

This prevents API validation errors on the next LLM call.

Compaction: compact() (line 265)

Triggered when context_length > max_context. Strategy:

  1. Never touch: System message (always first)
  2. Never touch: First user message (the original task -- critical for continuity)
  3. Summarize: Everything between the first user message and the last 5 messages
  4. Never touch: Last 5 messages (recent context, walked back to a user message boundary)

The summarization itself is an LLM call with a specific prompt (line 302-306):

"Summarize this conversation concisely, preserving: key decisions, the 'why' behind decisions, problems solved, and important context needed for developing further."

The summary uses reasoning_effort="high" for quality, and the budget is 10% of the max context window.

Undo: undo_last_turn() (line 228)

Pops messages from the end until the last user message is removed (including all subsequent assistant/tool messages from that turn).


Session (agent/core/session.py)

Key State

# session.py:83-127
class Session:
    session_id: str          # UUID
    context_manager          # ContextManager
    tool_router              # ToolRouter
    config                   # Config
    hf_token: str
    stream: bool
    _cancelled: asyncio.Event
    pending_approval         # Tool calls awaiting user decision
    sandbox                  # Sandbox instance (created on demand)
    _running_job_ids: set    # For cleanup on cancel
    logged_events: list      # Full event trajectory
    turn_count: int

Event Dual-Logging (line 128)

Every event is both:

  1. Put on the event queue (for the UI)
  2. Appended to logged_events (for trajectory persistence)

Model Switching (line 153)

update_model() changes the model and recalculates the context window limit. The new limit comes from a multi-strategy lookup:

  1. Hardcoded map for Anthropic models (agent/core/session.py:20-28)
  2. HF Router Catalog for HF models
  3. litellm.get_max_tokens() fallback
  4. Default: 200,000 tokens

Auto-Save and Upload (line 162, 248)

Every N turns (default 3), the session trajectory is saved to a local JSON file, then a detached subprocess is spawned to upload it to an HF dataset repo. The subprocess (session_uploader.py) uses start_new_session=True so it survives even if the agent process exits.


Doom Loop Detection (agent/core/doom_loop.py)

Problem

LLMs sometimes get stuck calling the same tool repeatedly with the same arguments, or cycling through a pattern like [search, read, search, read] without making progress.

Detection Algorithm

Called before each LLM call. Extracts tool call signatures (tool name + MD5 hash of arguments) from the last 30 messages.

Pattern 1: Identical consecutive calls (line 55)

[A, A, A] -> detected at threshold 3

Pattern 2: Repeating sequences (line 74)

[A, B, A, B]       -> sequence length 2, 2 repetitions
[A, B, C, A, B, C] -> sequence length 3, 2 repetitions

Checks sequences of length 2-5 with 2+ repetitions.

Corrective Action

When detected, a corrective message is injected as a user message (not system -- higher priority for the LLM):

"STOP repeating this approach - it is not working. Step back and try a fundamentally different strategy. Consider: using a different tool, changing your arguments significantly, or explaining to the user what you're stuck on."


LLM Parameter Resolution (agent/core/llm_params.py)

Two Routing Paths

# Simplified (llm_params.py:18-76)
def _resolve_llm_params(session):
    model = session.config.model_name
 
    if model.startswith("anthropic/") or model.startswith("openai/"):
        # Direct API: pass model name straight through
        return {"model": model, "reasoning_effort": effort}
 
    else:
        # HF Router: route through https://router.huggingface.co/v1
        return {
            "model": f"openai/{model}",  # LiteLLM OpenAI adapter
            "api_base": "https://router.huggingface.co/v1",
            "api_key": token,
            "reasoning_effort": None,  # passed in extra_body instead
            "extra_body": {"reasoning_effort": effort},
        }

HF Router billing: When INFERENCE_TOKEN is set (hosted Space deployment), adds X-HF-Bill-To: huggingface header for organizational billing.

Routing suffixes: Model IDs can include provider hints like MiniMaxAI/MiniMax-M2.7:cheapest or moonshotai/Kimi-K2.6:novita, which the HF Router uses for provider selection.


HF Router Catalog (agent/core/hf_router_catalog.py)

Fetches and caches the model catalog from https://router.huggingface.co/v1/models with a 5-minute TTL. Used for:

  1. Model validation: Check if a model ID exists before switching
  2. Context window discovery: Find the max context length across live providers
  3. Tool support checking: Verify the model supports function calling
  4. Fuzzy suggestions: When a user enters an unknown model, suggest close matches (difflib.get_close_matches with 0.4 cutoff)
  5. Pricing info: Show per-provider input/output token prices during model preflight

Pre-warmed on startup as a background task (main.py:944) so the first model switch is instant.


Configuration (agent/config.py)

Pydantic BaseModel with env var substitution supporting ${VAR} and ${VAR:-default} syntax:

// configs/main_agent_config.json
{
  "model_name": "anthropic/claude-opus-4-6",
  "save_sessions": true,
  "session_dataset_repo": "akseljoonas/hf-agent-sessions",
  "yolo_mode": false,
  "confirm_cpu_jobs": true,
  "auto_file_upload": true,
  "mcpServers": {
    "huggingface": {
      "type": "sse",
      "url": "https://huggingface.co/mcp?login",
      "headers": { "Authorization": "Bearer ${HF_TOKEN}" }
    }
  }
}

Key defaults: