03 - Agent Core Deep Dive
The Agent Loop (agent/core/agent_loop.py)
This is the heart of the system -- a 1,198-line file implementing the think-act cycle.
Entry Point: submission_loop() (line 1128)
The outermost function. Creates a Session, opens the ToolRouter (which initializes MCP connections), emits a ready event, then loops forever reading from the submission queue and dispatching to handlers.
# Simplified flow (agent_loop.py:1128-1198)
async def submission_loop(submission_queue, event_queue, config, ...):
session = Session(config, event_queue, ...)
async with session.tool_router:
session.send_event("ready", {"tools": tool_count})
retry_failed_uploads() # Resume any failed session uploads
while True:
submission = await submission_queue.get()
await process_submission(session, submission)On unclean exit, an emergency save fires (line 1188-1198) to persist the session trajectory.
Core Loop: Handlers.run_agent() (line 436)
This is where the magic happens. The method implements a bounded while-loop (max 300 iterations by default) that:
- Checks cancellation (line 469)
- Compacts context if needed (line 472) -- triggers LLM-based summarization when near the limit
- Checks for doom loops (line 476) -- injects corrective prompts if repetitive patterns detected
- Calls the LLM (line 489) -- streaming or non-streaming via litellm
- Handles truncation (line 512-551) -- if
finish_reason == "length", drops tool calls and injects a hint to use smaller arguments - Parses tool calls (line 598-609) -- splits into valid (parseable JSON args) vs malformed
- Checks approvals (line 648-654) -- separates tools needing user consent
- Executes tools in parallel (line 683-748) --
asyncio.gatherwith per-task cancellation - Handles approval flow (line 751-782) -- emits event and pauses if any tools need approval
- Loops or completes (line 818) -- if tools were called, iterate; if text-only response, done
Streaming LLM Calls: _call_llm_streaming() (line 245)
Accumulates content and tool call deltas from SSE chunks. Tool calls arrive as incremental fragments:
# Simplified (agent_loop.py:303-317)
for chunk in response:
for delta_tc in chunk.choices[0].delta.tool_calls:
idx = delta_tc.index
if idx not in tool_calls:
tool_calls[idx] = {"id": "", "name": "", "arguments": ""}
if delta_tc.id:
tool_calls[idx]["id"] += delta_tc.id
if delta_tc.function.name:
tool_calls[idx]["name"] += delta_tc.function.name
if delta_tc.function.arguments:
tool_calls[idx]["arguments"] += delta_tc.function.argumentsEach content chunk emits an assistant_chunk event for real-time display.
Parallel Tool Execution (line 683-748)
Tools execute concurrently via asyncio.gather. Each tool call is wrapped in a cancellation-aware wrapper:
# Simplified (agent_loop.py:683-748)
async def _exec_one(tc):
if session.is_cancelled:
return "Cancelled"
session.send_event("tool_call", {name, args_preview})
output, success = await session.tool_router.call_tool(name, args)
session.send_event("tool_output", {output, success})
return output
results = await asyncio.gather(*[_exec_one(tc) for tc in tools])Approval System: _needs_approval() (line 48)
A policy function that determines which tool calls require explicit user consent:
| Tool | Condition | Rationale |
|---|---|---|
sandbox_create |
Always | Creates a billable HF Space |
hf_jobs with run/scheduled run |
GPU: always; CPU: configurable | GPU jobs cost money |
hf_repo_files with upload/delete |
Always | Modifies shared repos |
hf_repo_git with destructive ops |
Always | delete_branch, merge_pr, create_repo, etc. |
YOLO mode (line 53) bypasses all approvals. Enabled via /yolo command or --yolo CLI flag.
Error Recovery
Transient error retries (line 118-136): 3 attempts with delays [5s, 15s, 30s] for timeouts, rate limits, 5xx errors, connection errors. Pattern-matched via string matching on error messages in _is_transient_error() (line 123).
Context window exceeded (line 786-800): Catches ContextWindowExceededError, forces context_length above the limit to trigger compaction, then retries the same turn.
Malformed JSON tool args (line 598-641): When the LLM produces invalid JSON in tool arguments, the error is returned as a tool result so the LLM can self-correct.
Friendly error messages (line 139): Maps known error patterns to actionable user guidance:
- Auth failure -> "Check your API key"
- Insufficient credits -> "Check your billing"
- Model not found -> Suggests alternatives from the HF catalog
Cancellation Flow: _cleanup_on_cancel() (line 210)
On Ctrl+C:
- Kills all sandbox processes (if sandbox exists)
- Cancels any running HF jobs (tracked via
session._running_job_ids)
Double Ctrl+C within 1 second quits the application entirely (implemented in main.py:1005).
Context Manager (agent/context_manager/manager.py)
Purpose
Manages the conversation's message history, tracks token usage, handles compaction (summarization to fit within the context window), and repairs protocol violations.
System Prompt Loading (line 96)
Loads system_prompt_v3.yaml, renders with Jinja2 (passing tools and num_tools), then appends runtime context:
# Simplified (manager.py:127-147)
if local_mode:
system_content += "\n\nCLI MODE: No sandbox available..."
system_content += f"\nWorking directory: {os.getcwd()}"
system_content += f"\nToday: {date}, Time: {time} ({tz})"
system_content += f"\nUser: {hf_username}"
system_content += f"\nTools available: {num_tools}"Dangling Tool Call Repair: _patch_dangling_tool_calls() (line 185)
LLM APIs require every tool call to have a matching tool result message. When a turn is interrupted mid-execution, some tool calls may lack results. This method scans backwards from the end of the conversation, finds unmatched tool call IDs, and injects stub results:
"Tool was not executed (interrupted or error)."
This prevents API validation errors on the next LLM call.
Compaction: compact() (line 265)
Triggered when context_length > max_context. Strategy:
- Never touch: System message (always first)
- Never touch: First user message (the original task -- critical for continuity)
- Summarize: Everything between the first user message and the last 5 messages
- Never touch: Last 5 messages (recent context, walked back to a user message boundary)
The summarization itself is an LLM call with a specific prompt (line 302-306):
"Summarize this conversation concisely, preserving: key decisions, the 'why' behind decisions, problems solved, and important context needed for developing further."
The summary uses reasoning_effort="high" for quality, and the budget is 10% of the max context window.
Undo: undo_last_turn() (line 228)
Pops messages from the end until the last user message is removed (including all subsequent assistant/tool messages from that turn).
Session (agent/core/session.py)
Key State
# session.py:83-127
class Session:
session_id: str # UUID
context_manager # ContextManager
tool_router # ToolRouter
config # Config
hf_token: str
stream: bool
_cancelled: asyncio.Event
pending_approval # Tool calls awaiting user decision
sandbox # Sandbox instance (created on demand)
_running_job_ids: set # For cleanup on cancel
logged_events: list # Full event trajectory
turn_count: intEvent Dual-Logging (line 128)
Every event is both:
- Put on the event queue (for the UI)
- Appended to
logged_events(for trajectory persistence)
Model Switching (line 153)
update_model() changes the model and recalculates the context window limit. The new limit comes from a multi-strategy lookup:
- Hardcoded map for Anthropic models (
agent/core/session.py:20-28) - HF Router Catalog for HF models
litellm.get_max_tokens()fallback- Default: 200,000 tokens
Auto-Save and Upload (line 162, 248)
Every N turns (default 3), the session trajectory is saved to a local JSON file, then a detached subprocess is spawned to upload it to an HF dataset repo. The subprocess (session_uploader.py) uses start_new_session=True so it survives even if the agent process exits.
Doom Loop Detection (agent/core/doom_loop.py)
Problem
LLMs sometimes get stuck calling the same tool repeatedly with the same arguments, or cycling through a pattern like [search, read, search, read] without making progress.
Detection Algorithm
Called before each LLM call. Extracts tool call signatures (tool name + MD5 hash of arguments) from the last 30 messages.
Pattern 1: Identical consecutive calls (line 55)
[A, A, A] -> detected at threshold 3
Pattern 2: Repeating sequences (line 74)
[A, B, A, B] -> sequence length 2, 2 repetitions
[A, B, C, A, B, C] -> sequence length 3, 2 repetitions
Checks sequences of length 2-5 with 2+ repetitions.
Corrective Action
When detected, a corrective message is injected as a user message (not system -- higher priority for the LLM):
"STOP repeating this approach - it is not working. Step back and try a fundamentally different strategy. Consider: using a different tool, changing your arguments significantly, or explaining to the user what you're stuck on."
LLM Parameter Resolution (agent/core/llm_params.py)
Two Routing Paths
# Simplified (llm_params.py:18-76)
def _resolve_llm_params(session):
model = session.config.model_name
if model.startswith("anthropic/") or model.startswith("openai/"):
# Direct API: pass model name straight through
return {"model": model, "reasoning_effort": effort}
else:
# HF Router: route through https://router.huggingface.co/v1
return {
"model": f"openai/{model}", # LiteLLM OpenAI adapter
"api_base": "https://router.huggingface.co/v1",
"api_key": token,
"reasoning_effort": None, # passed in extra_body instead
"extra_body": {"reasoning_effort": effort},
}HF Router billing: When INFERENCE_TOKEN is set (hosted Space deployment), adds X-HF-Bill-To: huggingface header for organizational billing.
Routing suffixes: Model IDs can include provider hints like MiniMaxAI/MiniMax-M2.7:cheapest or moonshotai/Kimi-K2.6:novita, which the HF Router uses for provider selection.
HF Router Catalog (agent/core/hf_router_catalog.py)
Fetches and caches the model catalog from https://router.huggingface.co/v1/models with a 5-minute TTL. Used for:
- Model validation: Check if a model ID exists before switching
- Context window discovery: Find the max context length across live providers
- Tool support checking: Verify the model supports function calling
- Fuzzy suggestions: When a user enters an unknown model, suggest close matches (
difflib.get_close_matcheswith 0.4 cutoff) - Pricing info: Show per-provider input/output token prices during model preflight
Pre-warmed on startup as a background task (main.py:944) so the first model switch is instant.
Configuration (agent/config.py)
Pydantic BaseModel with env var substitution supporting ${VAR} and ${VAR:-default} syntax:
// configs/main_agent_config.json
{
"model_name": "anthropic/claude-opus-4-6",
"save_sessions": true,
"session_dataset_repo": "akseljoonas/hf-agent-sessions",
"yolo_mode": false,
"confirm_cpu_jobs": true,
"auto_file_upload": true,
"mcpServers": {
"huggingface": {
"type": "sse",
"url": "https://huggingface.co/mcp?login",
"headers": { "Authorization": "Bearer ${HF_TOKEN}" }
}
}
}Key defaults:
max_iterations: 300 (hard cap on agent loop iterations per turn)reasoning_effort: "high" (defaults to high, valuing correctness over speed)auto_save_interval: 3 (save trajectory every 3 turns)