04 - Agent Tools
Tool System Architecture
Registration and Dispatch (agent/core/tools.py)
Tools are registered via two mechanisms:
-
Built-in tools: Python async functions wrapped in
ToolSpecdataclasses (line 116). Each has aname,description,parameters(JSON Schema), andhandlercallable. -
MCP tools: Discovered dynamically from the HF MCP server via
fastmcp.Client. Registered withhandler=Noneand dispatched through the MCP client.
The ToolRouter (line 126) is the central dispatch hub:
# Simplified (tools.py:234-270)
async def call_tool(self, name, args, session=None, tool_call_id=None):
spec = self._tool_map[name]
if spec.handler:
# Built-in: inspect signature to pass optional session/tool_call_id
sig = inspect.signature(spec.handler)
kwargs = {"args": args}
if "session" in sig.parameters:
kwargs["session"] = session
if "tool_call_id" in sig.parameters:
kwargs["tool_call_id"] = tool_call_id
return await spec.handler(**kwargs)
else:
# MCP: delegate to MCP client
result = await self.mcp_client.call_tool(name, args)
return convert_mcp_content_to_string(result), TrueHandler Convention
All tool handlers return tuple[str, bool] -- (output text, success flag). The agent loop adds the output as a tool result message, and the success flag determines display styling.
Tool Creation Order (tools.py:282)
The order tools are listed in matters -- it affects their position in the LLM's tool list:
- Sandbox/local tools (prepended):
sandbox_create,bash,read,write,edit research(sub-agent)explore_hf_docs,fetch_hf_docshf_papershf_inspect_datasetplan_toolhf_jobshf_repo_files,hf_repo_gitgithub_find_examples,github_list_repos,github_read_filefind_hf_api(registered async, from OpenAPI spec)- MCP tools from HF MCP server
Blocked MCP Tools (tools.py:65)
Some MCP tools are blocked to avoid conflicts with built-in implementations:
NOT_ALLOWED_TOOL_NAMES = {"hf_jobs", "hf_doc_search", "hf_doc_fetch", "hf_whoami"}Complete Tool Catalog
1. Sandbox / Code Execution
sandbox_create (agent/tools/sandbox_tool.py:203)
- Purpose: Creates a remote code execution sandbox as an HF Space
- How it works: Duplicates a template Space (
burtenshaw/sandbox), uploads a FastAPI server + Dockerfile, waits for it to come online - Hardware options: CPU Basic to A100 GPU
- Requires approval: Always (creates billable infrastructure)
bash (sandbox_tool.py:237 or local_tools.py:315)
- Purpose: Execute shell commands
- Sandbox mode: Proxies to the sandbox Space's
/api/bashendpoint - Local mode: Runs via
asyncio.create_subprocess_shellwith timeout - Limits: Output capped at 25,000 chars (sandbox) or similar (local). Timeout default 240s (sandbox), 36,000s (local)
read (sandbox_tool.py:237 or local_tools.py:352)
- Purpose: Read file contents with line numbers
- Default limit: 2,000 lines
- Tracks read files: Maintains
_files_readset for write/edit safety
write (sandbox_tool.py:237 or local_tools.py:371)
- Purpose: Create or overwrite files atomically
- Safety: Refuses to write to existing files that haven't been read first (
_files_readenforcement) - Atomic writes: Uses
tempfile+renamepattern
edit (sandbox_tool.py:237 or local_tools.py:394)
- Purpose: Find-and-replace edits with fuzzy matching
- Modes:
replace,append_after,prepend_before, plusreplace_allflag - Fuzzy matching (
edit_utils.py:35-89): 4-pass strategy:- Exact match
- Right-trim whitespace
- Both-sides trim
- Unicode normalization (em-dashes, smart quotes, zero-width spaces)
- Python validation: AST-based syntax check after edit (
edit_utils.py:233)
The Sandbox Server (Embedded in sandbox_client.py:100-456)
A complete FastAPI application stored as a string literal and uploaded to the HF Space. Endpoints: /api/bash, /api/read, /api/write, /api/edit, /api/exists, /api/kill, /api/health. Runs on ghcr.io/astral-sh/uv:python3.12-bookworm-slim with development tools pre-installed.
2. Research Sub-Agent
research (agent/tools/research_tool.py)
- Purpose: Spawns an independent LLM sub-agent with its own context window for deep research
- Model: Uses a cheaper model --
anthropic/claude-sonnet-4-6when main agent is on Anthropic (line 217-221) - Own tool set (line 29-41):
read,bash,explore_hf_docs,fetch_hf_docs,find_hf_api,hf_papers,github_find_examples,github_list_repos,github_read_file,hf_inspect_dataset,hf_repo_files - Context budget: Warns at 170k tokens, hard-stops at 190k tokens (lines 25-27)
- Max iterations: 60 (line 296)
- Doom loop detection: Uses the same
check_for_doom_loop()as the main agent - Tool output truncation: Sub-agent tool outputs capped at 8,000 chars (line 419)
- Progress events: Sends
tool_logevents with per-agentagent_idandlabelfor live UI tracking
The research tool has its own system prompt (research_tool.py:43-169) focused on literature-first methodology:
"You are a research specialist agent. Your job is to thoroughly investigate a specific question..."
3. HuggingFace Documentation
explore_hf_docs (agent/tools/docs_tools.py:879)
- Purpose: Browse and search HF documentation structure
- Search engine: Whoosh full-text search with stemming analyzer
- Coverage: 37 documentation endpoints (transformers, datasets, peft, trl, accelerate, diffusers, gradio, etc.)
- Gradio special case: Uses Gradio's own embedding search API instead of Whoosh
fetch_hf_docs (docs_tools.py:957)
- Purpose: Fetch full markdown content of a specific doc page
- How: Appends
.mdto the doc URL and fetches raw content
find_hf_api (docs_tools.py:786, registered dynamically)
- Purpose: Search HuggingFace's REST API endpoints
- How: Builds a Whoosh index over the live OpenAPI spec, generates curl examples, formats parameters
4. Academic Papers
hf_papers (agent/tools/papers_tool.py)
- 11 operations:
trending,search,paper_details,read_paper,citation_graph,snippet_search,recommend,find_datasets,find_models,find_collections,find_all_resources - External APIs:
- HuggingFace Papers API (
https://huggingface.co/api) - Semantic Scholar API (
https://api.semanticscholar.org) -- for citations, recommendations, snippet search - arXiv HTML / ar5iv -- for full paper reading
- HuggingFace Papers API (
- Paper reading: HTML parsing with BeautifulSoup, section-level extraction with fuzzy section lookup
- Rate limiting: 1 req/s for S2 search, 0.1s for other S2 calls, with retry on 429/5xx
- Caching: Response cache (max 500 entries) for deduplication
- Parallel fetching:
find_all_resourcesusesasyncio.gatherfor datasets+models+collections simultaneously
5. Training Jobs
hf_jobs (agent/tools/jobs_tool.py)
- 12 operations:
run,ps,logs,inspect,cancel, plus scheduled variants (scheduled run/ps/inspect/delete/suspend/resume) - Two execution modes (line 492):
- Python mode: Script + dependencies, runs via UV in a Docker container
- Docker mode: Custom Docker image + command
- Log streaming (line 382): Real-time log streaming using asyncio Queue bridge between sync generator and async consumer, with retry logic (100 retries, 5s delay) for connection drops
- Job tracking (line 544): Adds job IDs to
session._running_job_idsfor cleanup on cancel - Hardware specs: CPU Basic (2 vCPU / 16GB), L4 (24GB VRAM), L40S (48GB VRAM), A100 (80GB VRAM), 8xH100 (640GB VRAM)
- Script resolution: Can read scripts from sandbox when given a file path
- Events: Sends
tool_state_changeevents with job URL and final status for UI display
6. Datasets
hf_inspect_dataset (agent/tools/dataset_tools.py)
- Purpose: Inspect HuggingFace dataset structure, splits, and sample data
- Two-phase parallel fetching (line 51-134):
- Phase 1:
/is-valid,/splits,/parquetin parallel - Phase 2:
/info,/first-rows(depend on auto-detected config/split)
- Phase 1:
- Messages column analysis (line 250-350): Detects chat/instruction format, analyzes roles, message keys, tool call support. Critical for validating SFT/DPO/GRPO training compatibility
7. Repository Management
hf_repo_files (agent/tools/hf_repo_files_tool.py)
- 4 operations:
list,read,upload,delete - Read: Downloads via
hf_hub_download, reads as UTF-8, truncates at 50k chars - Upload: Supports
create_prflag for PR-based changes - Delete: Supports wildcard patterns
- Requires approval:
uploadanddeleteoperations
hf_repo_git (agent/tools/hf_repo_git_tool.py)
- 14 operations:
create_branch,delete_branch,create_tag,delete_tag,list_refs,create_pr,list_prs,get_pr,merge_pr,close_pr,comment_pr,change_pr_status,create_repo,update_repo - Requires approval:
delete_branch,delete_tag,merge_pr,create_repo,update_repo - All sync HF Hub API calls wrapped with
asyncio.to_thread()
8. GitHub Integration
github_find_examples (agent/tools/github_find_examples.py)
- Purpose: Find example scripts/notebooks in GitHub repos
- Algorithm:
- Fetches full repo tree via
/git/trees/{branch}?recursive=1 - Filters to files matching example patterns (scripts/, examples/, notebooks/, tutorials/, etc.)
- Fuzzy scores using
thefuzz.fuzz:token_set_ratiofor pattern matching,partial_ratio+token_set_ratiofor keywords - Falls back to similar repo search if repo not found
- Fetches full repo tree via
github_read_file (agent/tools/github_read_file.py)
- Purpose: Read file contents from GitHub
- Auto-converts
.ipynbto markdown usingnbconvert - Default truncation at 300 lines, with
line_start/line_endsupport - Falls back to raw download for large files
github_list_repos (agent/tools/github_list_repos.py)
- Purpose: List repos in a GitHub org/user
- Client-side sorting for stars/forks (GitHub list API doesn't support these sort fields)
9. Planning
plan_tool (agent/tools/plan_tool.py)
- Purpose: Todo list with status tracking (pending/in_progress/completed)
- Each call replaces the entire plan (not incremental updates)
- Emits
plan_updateevent for UI display - Module-level state (
_current_planlist)
Tool Integration Summary
| External Service | Tools Using It |
|---|---|
| HuggingFace Hub API | hf_jobs, hf_repo_files, hf_repo_git, sandbox_create |
| HuggingFace Dataset Server | hf_inspect_dataset |
| HuggingFace Papers API | hf_papers |
| HuggingFace Docs (raw MD) | explore_hf_docs, fetch_hf_docs |
| HuggingFace OpenAPI spec | find_hf_api |
| HuggingFace MCP Server | Various MCP tools (dynamic) |
| Semantic Scholar API | hf_papers (citations, snippets, recommendations) |
| arXiv / ar5iv HTML | hf_papers (paper reading) |
| GitHub REST API | github_find_examples, github_read_file, github_list_repos |
| Gradio API | explore_hf_docs (Gradio-specific search) |
| HF Spaces (sandbox) | bash, read, write, edit |
Cross-Cutting Patterns
Async Wrapping
All sync HuggingFace Hub API calls are universally wrapped with asyncio.to_thread() via local _async_call() helper functions.
Error Handling Convention
All tools use try/except with specific exception types (HfHubHTTPError, RepositoryNotFoundError, httpx.HTTPStatusError) and return formatted error strings with success=False.
Sub-Agent LLM Usage
Only the research tool uses LLM calls internally. All other tools are pure API/computation tools.
Tool Output Limits
- Sandbox bash: 25,000 chars
- Sandbox read: 4,000 lines
- Local bash: similar with temp file spillover
- GitHub read: 300 lines default
- HF repo read: 50,000 chars
- Research tool outputs (within sub-agent): 8,000 chars