03 - Agent Core Deep Dive

The Agent Loop (`agent/core/agent_loop.py`)

This is the heart of the system -- a 1,198-line file implementing the think-act cycle.

Entry Point: `submission_loop()` (line 1128)

The outermost function. Creates a Session, opens the ToolRouter (which initializes MCP connections), emits a ready event, then loops forever reading from the submission queue and dispatching to handlers.

# Simplified flow (agent_loop.py:1128-1198)
async def submission_loop(submission_queue, event_queue, config, ...):
    session = Session(config, event_queue, ...)
    async with session.tool_router:
        session.send_event("ready", {"tools": tool_count})
        retry_failed_uploads()  # Resume any failed session uploads
 
        while True:
            submission = await submission_queue.get()
            await process_submission(session, submission)

On unclean exit, an emergency save fires (line 1188-1198) to persist the session trajectory.

Core Loop: `Handlers.run_agent()` (line 436)

This is where the magic happens. The method implements a bounded while-loop (max 300 iterations by default) that:

Checks cancellation (line 469)
Compacts context if needed (line 472) -- triggers LLM-based summarization when near the limit
Checks for doom loops (line 476) -- injects corrective prompts if repetitive patterns detected
Calls the LLM (line 489) -- streaming or non-streaming via litellm
Handles truncation (line 512-551) -- if finish_reason == "length", drops tool calls and injects a hint to use smaller arguments
Parses tool calls (line 598-609) -- splits into valid (parseable JSON args) vs malformed
Checks approvals (line 648-654) -- separates tools needing user consent
Executes tools in parallel (line 683-748) -- asyncio.gather with per-task cancellation
Handles approval flow (line 751-782) -- emits event and pauses if any tools need approval
Loops or completes (line 818) -- if tools were called, iterate; if text-only response, done

Streaming LLM Calls: `_call_llm_streaming()` (line 245)

Accumulates content and tool call deltas from SSE chunks. Tool calls arrive as incremental fragments:

# Simplified (agent_loop.py:303-317)
for chunk in response:
    for delta_tc in chunk.choices[0].delta.tool_calls:
        idx = delta_tc.index
        if idx not in tool_calls:
            tool_calls[idx] = {"id": "", "name": "", "arguments": ""}
        if delta_tc.id:
            tool_calls[idx]["id"] += delta_tc.id
        if delta_tc.function.name:
            tool_calls[idx]["name"] += delta_tc.function.name
        if delta_tc.function.arguments:
            tool_calls[idx]["arguments"] += delta_tc.function.arguments

Each content chunk emits an assistant_chunk event for real-time display.

Parallel Tool Execution (line 683-748)

Tools execute concurrently via asyncio.gather. Each tool call is wrapped in a cancellation-aware wrapper:

# Simplified (agent_loop.py:683-748)
async def _exec_one(tc):
    if session.is_cancelled:
        return "Cancelled"
    session.send_event("tool_call", {name, args_preview})
    output, success = await session.tool_router.call_tool(name, args)
    session.send_event("tool_output", {output, success})
    return output
 
results = await asyncio.gather(*[_exec_one(tc) for tc in tools])

Approval System: `_needs_approval()` (line 48)

A policy function that determines which tool calls require explicit user consent:

Tool	Condition	Rationale
`sandbox_create`	Always	Creates a billable HF Space
`hf_jobs` with `run`/`scheduled run`	GPU: always; CPU: configurable	GPU jobs cost money
`hf_repo_files` with `upload`/`delete`	Always	Modifies shared repos
`hf_repo_git` with destructive ops	Always	`delete_branch`, `merge_pr`, `create_repo`, etc.

YOLO mode (line 53) bypasses all approvals. Enabled via /yolo command or --yolo CLI flag.

Error Recovery

Transient error retries (line 118-136): 3 attempts with delays [5s, 15s, 30s] for timeouts, rate limits, 5xx errors, connection errors. Pattern-matched via string matching on error messages in _is_transient_error() (line 123).

Context window exceeded (line 786-800): Catches ContextWindowExceededError, forces context_length above the limit to trigger compaction, then retries the same turn.

Malformed JSON tool args (line 598-641): When the LLM produces invalid JSON in tool arguments, the error is returned as a tool result so the LLM can self-correct.

Friendly error messages (line 139): Maps known error patterns to actionable user guidance:

Auth failure -> "Check your API key"
Insufficient credits -> "Check your billing"
Model not found -> Suggests alternatives from the HF catalog

Cancellation Flow: `_cleanup_on_cancel()` (line 210)

On Ctrl+C:

Kills all sandbox processes (if sandbox exists)
Cancels any running HF jobs (tracked via session._running_job_ids)

Double Ctrl+C within 1 second quits the application entirely (implemented in main.py:1005).

Context Manager (`agent/context_manager/manager.py`)

Purpose

Manages the conversation's message history, tracks token usage, handles compaction (summarization to fit within the context window), and repairs protocol violations.

System Prompt Loading (line 96)

Loads system_prompt_v3.yaml, renders with Jinja2 (passing tools and num_tools), then appends runtime context:

# Simplified (manager.py:127-147)
if local_mode:
    system_content += "\n\nCLI MODE: No sandbox available..."
    system_content += f"\nWorking directory: {os.getcwd()}"
 
system_content += f"\nToday: {date}, Time: {time} ({tz})"
system_content += f"\nUser: {hf_username}"
system_content += f"\nTools available: {num_tools}"

Dangling Tool Call Repair: `_patch_dangling_tool_calls()` (line 185)

LLM APIs require every tool call to have a matching tool result message. When a turn is interrupted mid-execution, some tool calls may lack results. This method scans backwards from the end of the conversation, finds unmatched tool call IDs, and injects stub results:

"Tool was not executed (interrupted or error)."

This prevents API validation errors on the next LLM call.

Compaction: `compact()` (line 265)

Triggered when context_length > max_context. Strategy:

Never touch: System message (always first)
Never touch: First user message (the original task -- critical for continuity)
Summarize: Everything between the first user message and the last 5 messages
Never touch: Last 5 messages (recent context, walked back to a user message boundary)

The summarization itself is an LLM call with a specific prompt (line 302-306):

"Summarize this conversation concisely, preserving: key decisions, the 'why' behind decisions, problems solved, and important context needed for developing further."

The summary uses reasoning_effort="high" for quality, and the budget is 10% of the max context window.

Undo: `undo_last_turn()` (line 228)

Pops messages from the end until the last user message is removed (including all subsequent assistant/tool messages from that turn).

Session (`agent/core/session.py`)

Key State

# session.py:83-127
class Session:
    session_id: str          # UUID
    context_manager          # ContextManager
    tool_router              # ToolRouter
    config                   # Config
    hf_token: str
    stream: bool
    _cancelled: asyncio.Event
    pending_approval         # Tool calls awaiting user decision
    sandbox                  # Sandbox instance (created on demand)
    _running_job_ids: set    # For cleanup on cancel
    logged_events: list      # Full event trajectory
    turn_count: int

Event Dual-Logging (line 128)

Every event is both:

Put on the event queue (for the UI)
Appended to logged_events (for trajectory persistence)

Model Switching (line 153)

update_model() changes the model and recalculates the context window limit. The new limit comes from a multi-strategy lookup:

Hardcoded map for Anthropic models (agent/core/session.py:20-28)
HF Router Catalog for HF models
litellm.get_max_tokens() fallback
Default: 200,000 tokens

Auto-Save and Upload (line 162, 248)

Every N turns (default 3), the session trajectory is saved to a local JSON file, then a detached subprocess is spawned to upload it to an HF dataset repo. The subprocess (session_uploader.py) uses start_new_session=True so it survives even if the agent process exits.

Doom Loop Detection (`agent/core/doom_loop.py`)

Problem

LLMs sometimes get stuck calling the same tool repeatedly with the same arguments, or cycling through a pattern like [search, read, search, read] without making progress.

Detection Algorithm

Called before each LLM call. Extracts tool call signatures (tool name + MD5 hash of arguments) from the last 30 messages.

Pattern 1: Identical consecutive calls (line 55)

[A, A, A] -> detected at threshold 3

Pattern 2: Repeating sequences (line 74)

[A, B, A, B]       -> sequence length 2, 2 repetitions
[A, B, C, A, B, C] -> sequence length 3, 2 repetitions

Checks sequences of length 2-5 with 2+ repetitions.

Corrective Action

When detected, a corrective message is injected as a user message (not system -- higher priority for the LLM):

"STOP repeating this approach - it is not working. Step back and try a fundamentally different strategy. Consider: using a different tool, changing your arguments significantly, or explaining to the user what you're stuck on."

LLM Parameter Resolution (`agent/core/llm_params.py`)

Two Routing Paths

# Simplified (llm_params.py:18-76)
def _resolve_llm_params(session):
    model = session.config.model_name
 
    if model.startswith("anthropic/") or model.startswith("openai/"):
        # Direct API: pass model name straight through
        return {"model": model, "reasoning_effort": effort}
 
    else:
        # HF Router: route through https://router.huggingface.co/v1
        return {
            "model": f"openai/{model}",  # LiteLLM OpenAI adapter
            "api_base": "https://router.huggingface.co/v1",
            "api_key": token,
            "reasoning_effort": None,  # passed in extra_body instead
            "extra_body": {"reasoning_effort": effort},
        }

HF Router billing: When INFERENCE_TOKEN is set (hosted Space deployment), adds X-HF-Bill-To: huggingface header for organizational billing.

Routing suffixes: Model IDs can include provider hints like MiniMaxAI/MiniMax-M2.7:cheapest or moonshotai/Kimi-K2.6:novita, which the HF Router uses for provider selection.

HF Router Catalog (`agent/core/hf_router_catalog.py`)

Fetches and caches the model catalog from https://router.huggingface.co/v1/models with a 5-minute TTL. Used for:

Model validation: Check if a model ID exists before switching
Context window discovery: Find the max context length across live providers
Tool support checking: Verify the model supports function calling
Fuzzy suggestions: When a user enters an unknown model, suggest close matches (difflib.get_close_matches with 0.4 cutoff)
Pricing info: Show per-provider input/output token prices during model preflight

Pre-warmed on startup as a background task (main.py:944) so the first model switch is instant.

Configuration (`agent/config.py`)

Pydantic BaseModel with env var substitution supporting ${VAR} and ${VAR:-default} syntax:

// configs/main_agent_config.json
{
  "model_name": "anthropic/claude-opus-4-6",
  "save_sessions": true,
  "session_dataset_repo": "akseljoonas/hf-agent-sessions",
  "yolo_mode": false,
  "confirm_cpu_jobs": true,
  "auto_file_upload": true,
  "mcpServers": {
    "huggingface": {
      "type": "sse",
      "url": "https://huggingface.co/mcp?login",
      "headers": { "Authorization": "Bearer ${HF_TOKEN}" }
    }
  }
}

Key defaults:

max_iterations: 300 (hard cap on agent loop iterations per turn)
reasoning_effort: "high" (defaults to high, valuing correctness over speed)
auto_save_interval: 3 (save trajectory every 3 turns)