07 - Key Files Reference
File Map: Responsibility Index
Tier 1: Essential (Must Read to Understand the System)
| File | Lines | Responsibility |
|---|---|---|
agent/core/agent_loop.py |
1198 | The heart. Think-act cycle, streaming LLM calls, tool execution, approval flow, error recovery, retry logic |
agent/main.py |
1359 | CLI entry point. Interactive REPL, headless mode, event rendering, approval prompts, slash commands, signal handling |
agent/core/tools.py |
376 | Tool system. ToolSpec/ToolRouter, built-in + MCP dispatch, schema generation, introspection-based handler calling |
agent/core/session.py |
306 | Session state. Context manager, tool router, cancellation, event dual-logging, model switching, auto-save/upload |
agent/context_manager/manager.py |
330 | Context management. System prompt loading, message history, compaction (LLM summarization), dangling tool call repair |
agent/prompts/system_prompt_v3.yaml |
165 | Active system prompt. Agent persona, literature-first approach, anti-hallucination rules, autonomous mode directives |
backend/routes/agent.py |
504 | API surface. All REST + SSE endpoints, model config, SSE streaming, event generator |
backend/session_manager.py |
457 | Multi-session orchestration. Session lifecycle, EventBroadcaster fan-out, capacity management |
frontend/src/hooks/useAgentChat.ts |
743 | Frontend orchestrator. Per-session chat state, side-channel events, research tracking, reconnection |
frontend/src/lib/sse-chat-transport.ts |
397 | SSE transport. Custom ChatTransport bridging backend events to Vercel AI SDK |
Tier 2: Important (Understand Key Features)
| File | Lines | Responsibility |
|---|---|---|
agent/tools/research_tool.py |
460 | Research sub-agent. Independent LLM context, cheaper model, read-only tools, 60-iteration cap |
agent/tools/sandbox_client.py |
1054 | Sandbox infrastructure. HF Space creation, embedded FastAPI server, embedded Dockerfile, HTTP client with retry |
agent/tools/sandbox_tool.py |
292 | Sandbox tools. Exposes sandbox_create/bash/read/write/edit as agent tools, auto-creation on first use |
agent/tools/jobs_tool.py |
1095 | Training jobs. Python/Docker mode, log streaming, job tracking, hardware selection, scheduled jobs |
agent/tools/papers_tool.py |
1295 | Paper research. 11 operations, HF + Semantic Scholar + arXiv APIs, citation graphs, full paper reading |
agent/tools/docs_tools.py |
979 | Documentation. Whoosh search over 37 doc endpoints, OpenAPI spec indexing, doc page fetching |
agent/core/doom_loop.py |
135 | Loop detection. Identical consecutive + repeating sequence detection, corrective prompt injection |
agent/core/llm_params.py |
77 | LLM routing. Direct API vs HF Router, token resolution, reasoning effort, billing headers |
frontend/src/store/agentStore.ts |
469 | UI state. Per-session state map, active session mirroring, research agents, panels, plans |
frontend/src/components/Chat/ToolCallGroup.tsx |
1117 | Tool approval UI. Batch/individual approval, research visualization, script preview, hardware pricing |
Tier 3: Supporting (Understand Specific Subsystems)
| File | Lines | Responsibility |
|---|---|---|
agent/tools/local_tools.py |
426 | Local-mode replacement for sandbox tools (subprocess, pathlib) |
agent/tools/dataset_tools.py |
439 | Dataset inspection with parallel fetching, chat format analysis |
agent/tools/hf_repo_git_tool.py |
664 | 14 git operations on HF repos (branches, tags, PRs) |
agent/tools/hf_repo_files_tool.py |
323 | File CRUD on HF repos (list, read, upload, delete) |
agent/tools/github_find_examples.py |
460 | Fuzzy example discovery in GitHub repos |
agent/tools/github_read_file.py |
302 | GitHub file reading with notebook conversion |
agent/tools/github_list_repos.py |
287 | GitHub org/user repo listing |
agent/tools/edit_utils.py |
268 | 4-pass fuzzy matching for code edits |
agent/tools/plan_tool.py |
131 | Todo list with status tracking |
agent/tools/utilities.py |
142 | Job formatting helpers (tables, details) |
agent/config.py |
99 | Pydantic config with env var substitution |
agent/core/hf_router_catalog.py |
129 | HF model catalog cache, fuzzy suggestions |
agent/core/session_uploader.py |
202 | Detached subprocess for HF dataset upload |
agent/utils/terminal_display.py |
516 | CLI rendering: Rich markdown, typewriter, sub-agent live display |
agent/utils/particle_logo.py |
228 | Particle animation (spring physics, braille rendering) |
agent/utils/braille.py |
120 | Braille pixel canvas (2x4 resolution per cell) |
agent/utils/crt_boot.py |
113 | CRT boot animation (glitch, noise, scanlines) |
agent/utils/boot_timing.py |
16 | Shared math (settle_curve, color interpolation) |
agent/utils/reliability_checks.py |
14 | Training script save pattern check |
backend/main.py |
82 | FastAPI app setup, CORS, routers, static files |
backend/models.py |
105 | Pydantic request/response models |
backend/routes/auth.py |
188 | HuggingFace OAuth 2.0 flow |
backend/dependencies.py |
143 | Auth middleware, token validation, dev mode |
frontend/src/components/CodePanel/CodePanel.tsx |
585 | Script preview, editing, output display, plans |
frontend/src/components/WelcomeScreen/WelcomeScreen.tsx |
458 | 3-step onboarding checklist |
frontend/src/components/Layout/AppLayout.tsx |
436 | Main layout with sidebar + resizable panel |
frontend/src/lib/convert-llm-messages.ts |
139 | Backend -> AI SDK message format conversion |
frontend/src/store/sessionStore.ts |
96 | Session list persistence |
frontend/src/store/layoutStore.ts |
41 | Layout preferences |
Configuration Files
| File | Purpose |
|---|---|
configs/main_agent_config.json |
Default agent config (model, MCP servers, session settings) |
agent/prompts/system_prompt.yaml |
V1 system prompt (historical) |
agent/prompts/system_prompt_v2.yaml |
V2 system prompt (historical) |
agent/prompts/system_prompt_v3.yaml |
V3 system prompt (ACTIVE) |
pyproject.toml |
Python project config, dependencies, entry point |
frontend/package.json |
Frontend dependencies |
frontend/vite.config.ts |
Vite build config, proxy, aliases |
frontend/tsconfig.json |
TypeScript config |
Dockerfile |
Multi-stage Docker build for HF Spaces |
backend/start.sh |
Production entrypoint script |
.devcontainer/devcontainer.json |
Dev container (Claude Code sandbox) |
.gitignore |
Git ignore patterns |
Important Prompts and Instructions
System Prompts
| File | Status | Key Innovation |
|---|---|---|
agent/prompts/system_prompt_v3.yaml |
ACTIVE | Literature-first research mandate, explicit failure mode catalog, scope-change prohibition, autonomous mode directives |
agent/prompts/system_prompt_v2.yaml |
Historical | Three-phase workflow (Research -> Plan -> Implement), training job checklists, verification checklist |
agent/prompts/system_prompt.yaml |
Historical | Original persona, 8 worked examples |
Research Sub-Agent Prompt
- Location:
agent/tools/research_tool.py:43-169(inline string) - Key focus: Literature-first methodology, tool-specific usage guidance, concrete over theoretical
Compaction Prompt
- Location:
agent/context_manager/manager.py:302-306(inline string) - Key focus: "Preserve key decisions, the 'why' behind decisions, problems solved"
Doom Loop Corrective Prompts
- Location:
agent/core/doom_loop.py:110-130(inline strings) - Key focus: "STOP repeating" + suggest fundamentally different strategies
Title Generation Prompt
- Location:
backend/routes/agent.py:174-177(inline string) - Key focus: Max 6 words, no quotes, captures topic
Design Patterns Summary
| Pattern | Where | Purpose |
|---|---|---|
| Queue-based producer-consumer | agent_loop.py, session_manager.py |
Decouples UI from agent logic |
| Event broadcasting (fan-out) | session_manager.py:41 |
One source queue -> N SSE subscribers |
| Strategy pattern | tools.py |
Built-in handlers vs MCP dispatch via uniform interface |
| Introspection-based dispatch | tools.py:253 |
inspect.signature() to detect session/tool_call_id params |
| Async context manager | tools.py:211 |
ToolRouter lifecycle (MCP init/cleanup) |
| Fire-and-forget subprocess | session_uploader.py:248 |
Detached upload process that survives agent exit |
| Read-before-write guard | sandbox_client.py:818, local_tools.py:386 |
Prevents blind file overwrites |
| Fuzzy matching cascade | edit_utils.py:35 |
4-pass matching (exact -> trim -> normalize) |
| Spring-damping physics | particle_logo.py:36 |
Particle animation convergence |
| Per-session state with active mirroring | agentStore.ts:265 |
Session-specific state mirrored to flat fields for active session |
| Custom transport bridge | sse-chat-transport.ts |
Backend SSE -> Vercel AI SDK UIMessageChunk |
| Side-channel callbacks | useAgentChat.ts:44 |
Non-chat events (research, panels) processed alongside chat |
Notable Design Tradeoffs
In-Memory Sessions (No Database)
Sessions live only in server memory. If the server restarts, sessions are lost. Session trajectories are uploaded to HF datasets as fire-and-forget, but there's no session resume from the server side. The frontend persists messages to localStorage as a partial mitigation.
Tradeoff: Simplicity and low latency vs. durability. Acceptable for a tool where sessions are typically short-lived.
Embedded Sandbox Server
The sandbox server code (FastAPI app + Dockerfile) is stored as a string literal inside sandbox_client.py and uploaded to the HF Space on creation. This means the sandbox server code can't be independently versioned or tested.
Tradeoff: Self-contained deployment (no external dependencies) vs. maintainability.
Downgraded Research Model
The research sub-agent uses a cheaper model (claude-sonnet-4-6 vs claude-opus-4-6). This saves cost on what can be many LLM calls per research session.
Tradeoff: Cost savings vs. research quality. Mitigated by the research tool having its own focused system prompt and access to the same information tools.
No Parallel Tool Execution for Approval Tools
Tools requiring approval are batched and presented to the user, but execution of approved tools happens sequentially within the approval handler. Non-approval tools execute in parallel.
Tradeoff: Simpler approval UX (user sees all pending tools at once) vs. potential latency when multiple approved tools could run concurrently.
Doom Loop Detection Heuristics
The detection uses MD5 hashing of argument strings and fixed thresholds (3 consecutive, sequences of 2-5). This can miss subtle loops and may trigger on legitimate repeated calls.
Tradeoff: Simple implementation that catches common cases vs. sophisticated analysis that might be more accurate but harder to maintain.