04 - Agent Tools

Tool System Architecture

Registration and Dispatch (`agent/core/tools.py`)

Tools are registered via two mechanisms:

Built-in tools: Python async functions wrapped in ToolSpec dataclasses (line 116). Each has a name, description, parameters (JSON Schema), and handler callable.
MCP tools: Discovered dynamically from the HF MCP server via fastmcp.Client. Registered with handler=None and dispatched through the MCP client.

The ToolRouter (line 126) is the central dispatch hub:

# Simplified (tools.py:234-270)
async def call_tool(self, name, args, session=None, tool_call_id=None):
    spec = self._tool_map[name]
    if spec.handler:
        # Built-in: inspect signature to pass optional session/tool_call_id
        sig = inspect.signature(spec.handler)
        kwargs = {"args": args}
        if "session" in sig.parameters:
            kwargs["session"] = session
        if "tool_call_id" in sig.parameters:
            kwargs["tool_call_id"] = tool_call_id
        return await spec.handler(**kwargs)
    else:
        # MCP: delegate to MCP client
        result = await self.mcp_client.call_tool(name, args)
        return convert_mcp_content_to_string(result), True

Handler Convention

All tool handlers return tuple[str, bool] -- (output text, success flag). The agent loop adds the output as a tool result message, and the success flag determines display styling.

Tool Creation Order (`tools.py:282`)

The order tools are listed in matters -- it affects their position in the LLM's tool list:

Sandbox/local tools (prepended): sandbox_create, bash, read, write, edit
research (sub-agent)
explore_hf_docs, fetch_hf_docs
hf_papers
hf_inspect_dataset
plan_tool
hf_jobs
hf_repo_files, hf_repo_git
github_find_examples, github_list_repos, github_read_file
find_hf_api (registered async, from OpenAPI spec)
MCP tools from HF MCP server

Blocked MCP Tools (`tools.py:65`)

Some MCP tools are blocked to avoid conflicts with built-in implementations:

NOT_ALLOWED_TOOL_NAMES = {"hf_jobs", "hf_doc_search", "hf_doc_fetch", "hf_whoami"}

Complete Tool Catalog

1. Sandbox / Code Execution

`sandbox_create` (`agent/tools/sandbox_tool.py:203`)

Purpose: Creates a remote code execution sandbox as an HF Space
How it works: Duplicates a template Space (burtenshaw/sandbox), uploads a FastAPI server + Dockerfile, waits for it to come online
Hardware options: CPU Basic to A100 GPU
Requires approval: Always (creates billable infrastructure)

`bash` (`sandbox_tool.py:237` or `local_tools.py:315`)

Purpose: Execute shell commands
Sandbox mode: Proxies to the sandbox Space's /api/bash endpoint
Local mode: Runs via asyncio.create_subprocess_shell with timeout
Limits: Output capped at 25,000 chars (sandbox) or similar (local). Timeout default 240s (sandbox), 36,000s (local)

`read` (`sandbox_tool.py:237` or `local_tools.py:352`)

Purpose: Read file contents with line numbers
Default limit: 2,000 lines
Tracks read files: Maintains _files_read set for write/edit safety

`write` (`sandbox_tool.py:237` or `local_tools.py:371`)

Purpose: Create or overwrite files atomically
Safety: Refuses to write to existing files that haven't been read first (_files_read enforcement)
Atomic writes: Uses tempfile + rename pattern

`edit` (`sandbox_tool.py:237` or `local_tools.py:394`)

Purpose: Find-and-replace edits with fuzzy matching
Modes: replace, append_after, prepend_before, plus replace_all flag
Fuzzy matching (edit_utils.py:35-89): 4-pass strategy:
1. Exact match
2. Right-trim whitespace
3. Both-sides trim
4. Unicode normalization (em-dashes, smart quotes, zero-width spaces)
Python validation: AST-based syntax check after edit (edit_utils.py:233)

The Sandbox Server (Embedded in `sandbox_client.py:100-456`)

A complete FastAPI application stored as a string literal and uploaded to the HF Space. Endpoints: /api/bash, /api/read, /api/write, /api/edit, /api/exists, /api/kill, /api/health. Runs on ghcr.io/astral-sh/uv:python3.12-bookworm-slim with development tools pre-installed.

2. Research Sub-Agent

`research` (`agent/tools/research_tool.py`)

Purpose: Spawns an independent LLM sub-agent with its own context window for deep research
Model: Uses a cheaper model -- anthropic/claude-sonnet-4-6 when main agent is on Anthropic (line 217-221)
Own tool set (line 29-41): read, bash, explore_hf_docs, fetch_hf_docs, find_hf_api, hf_papers, github_find_examples, github_list_repos, github_read_file, hf_inspect_dataset, hf_repo_files
Context budget: Warns at 170k tokens, hard-stops at 190k tokens (lines 25-27)
Max iterations: 60 (line 296)
Doom loop detection: Uses the same check_for_doom_loop() as the main agent
Tool output truncation: Sub-agent tool outputs capped at 8,000 chars (line 419)
Progress events: Sends tool_log events with per-agent agent_id and label for live UI tracking

The research tool has its own system prompt (research_tool.py:43-169) focused on literature-first methodology:

"You are a research specialist agent. Your job is to thoroughly investigate a specific question..."

3. HuggingFace Documentation

`explore_hf_docs` (`agent/tools/docs_tools.py:879`)

Purpose: Browse and search HF documentation structure
Search engine: Whoosh full-text search with stemming analyzer
Coverage: 37 documentation endpoints (transformers, datasets, peft, trl, accelerate, diffusers, gradio, etc.)
Gradio special case: Uses Gradio's own embedding search API instead of Whoosh

`fetch_hf_docs` (`docs_tools.py:957`)

Purpose: Fetch full markdown content of a specific doc page
How: Appends .md to the doc URL and fetches raw content

`find_hf_api` (`docs_tools.py:786`, registered dynamically)

Purpose: Search HuggingFace's REST API endpoints
How: Builds a Whoosh index over the live OpenAPI spec, generates curl examples, formats parameters

4. Academic Papers

`hf_papers` (`agent/tools/papers_tool.py`)

11 operations: trending, search, paper_details, read_paper, citation_graph, snippet_search, recommend, find_datasets, find_models, find_collections, find_all_resources
External APIs:
- HuggingFace Papers API (https://huggingface.co/api)
- Semantic Scholar API (https://api.semanticscholar.org) -- for citations, recommendations, snippet search
- arXiv HTML / ar5iv -- for full paper reading
Paper reading: HTML parsing with BeautifulSoup, section-level extraction with fuzzy section lookup
Rate limiting: 1 req/s for S2 search, 0.1s for other S2 calls, with retry on 429/5xx
Caching: Response cache (max 500 entries) for deduplication
Parallel fetching: find_all_resources uses asyncio.gather for datasets+models+collections simultaneously

5. Training Jobs

`hf_jobs` (`agent/tools/jobs_tool.py`)

12 operations: run, ps, logs, inspect, cancel, plus scheduled variants (scheduled run/ps/inspect/delete/suspend/resume)
Two execution modes (line 492):
- Python mode: Script + dependencies, runs via UV in a Docker container
- Docker mode: Custom Docker image + command
Log streaming (line 382): Real-time log streaming using asyncio Queue bridge between sync generator and async consumer, with retry logic (100 retries, 5s delay) for connection drops
Job tracking (line 544): Adds job IDs to session._running_job_ids for cleanup on cancel
Hardware specs: CPU Basic (2 vCPU / 16GB), L4 (24GB VRAM), L40S (48GB VRAM), A100 (80GB VRAM), 8xH100 (640GB VRAM)
Script resolution: Can read scripts from sandbox when given a file path
Events: Sends tool_state_change events with job URL and final status for UI display

6. Datasets

`hf_inspect_dataset` (`agent/tools/dataset_tools.py`)

Purpose: Inspect HuggingFace dataset structure, splits, and sample data
Two-phase parallel fetching (line 51-134):
- Phase 1: /is-valid, /splits, /parquet in parallel
- Phase 2: /info, /first-rows (depend on auto-detected config/split)
Messages column analysis (line 250-350): Detects chat/instruction format, analyzes roles, message keys, tool call support. Critical for validating SFT/DPO/GRPO training compatibility

7. Repository Management

`hf_repo_files` (`agent/tools/hf_repo_files_tool.py`)

4 operations: list, read, upload, delete
Read: Downloads via hf_hub_download, reads as UTF-8, truncates at 50k chars
Upload: Supports create_pr flag for PR-based changes
Delete: Supports wildcard patterns
Requires approval: upload and delete operations

`hf_repo_git` (`agent/tools/hf_repo_git_tool.py`)

14 operations: create_branch, delete_branch, create_tag, delete_tag, list_refs, create_pr, list_prs, get_pr, merge_pr, close_pr, comment_pr, change_pr_status, create_repo, update_repo
Requires approval: delete_branch, delete_tag, merge_pr, create_repo, update_repo
All sync HF Hub API calls wrapped with asyncio.to_thread()

8. GitHub Integration

`github_find_examples` (`agent/tools/github_find_examples.py`)

Purpose: Find example scripts/notebooks in GitHub repos
Algorithm:
1. Fetches full repo tree via /git/trees/{branch}?recursive=1
2. Filters to files matching example patterns (scripts/, examples/, notebooks/, tutorials/, etc.)
3. Fuzzy scores using thefuzz.fuzz: token_set_ratio for pattern matching, partial_ratio + token_set_ratio for keywords
4. Falls back to similar repo search if repo not found

`github_read_file` (`agent/tools/github_read_file.py`)

Purpose: Read file contents from GitHub
Auto-converts .ipynb to markdown using nbconvert
Default truncation at 300 lines, with line_start/line_end support
Falls back to raw download for large files

`github_list_repos` (`agent/tools/github_list_repos.py`)

Purpose: List repos in a GitHub org/user
Client-side sorting for stars/forks (GitHub list API doesn't support these sort fields)

9. Planning

`plan_tool` (`agent/tools/plan_tool.py`)

Purpose: Todo list with status tracking (pending/in_progress/completed)
Each call replaces the entire plan (not incremental updates)
Emits plan_update event for UI display
Module-level state (_current_plan list)

Tool Integration Summary

External Service	Tools Using It
HuggingFace Hub API	`hf_jobs`, `hf_repo_files`, `hf_repo_git`, `sandbox_create`
HuggingFace Dataset Server	`hf_inspect_dataset`
HuggingFace Papers API	`hf_papers`
HuggingFace Docs (raw MD)	`explore_hf_docs`, `fetch_hf_docs`
HuggingFace OpenAPI spec	`find_hf_api`
HuggingFace MCP Server	Various MCP tools (dynamic)
Semantic Scholar API	`hf_papers` (citations, snippets, recommendations)
arXiv / ar5iv HTML	`hf_papers` (paper reading)
GitHub REST API	`github_find_examples`, `github_read_file`, `github_list_repos`
Gradio API	`explore_hf_docs` (Gradio-specific search)
HF Spaces (sandbox)	`bash`, `read`, `write`, `edit`

Sandbox bash: 25,000 chars
Sandbox read: 4,000 lines
Local bash: similar with temp file spillover
GitHub read: 300 lines default
HF repo read: 50,000 chars
Research tool outputs (within sub-agent): 8,000 chars