05 - LLM Usage, Prompting, and Guardrails
How LLMs Are Leveraged
The project uses LLMs in three distinct contexts:
1. Main Agent Loop (Primary)
- File:
agent/core/agent_loop.py:245-350 - Library:
litellm.acompletion()-- unified interface across providers - Default model:
anthropic/claude-opus-4-6(configurable) - Streaming: Yes, via SSE chunks with
stream=True - Function calling: Standard OpenAI-compatible tool calling format
- Reasoning effort: Defaults to "high" (
agent/config.py:37)
2. Research Sub-Agent (Secondary)
- File:
agent/tools/research_tool.py:345 - Library: Same
litellm.acompletion() - Model: Downgraded -- uses
anthropic/claude-sonnet-4-6when main agent is Anthropic (research_tool.py:217-221) - Streaming: No (non-streaming for simplicity)
- Function calling: Same tool format, subset of tools
- Context budget: Independent window, warns at 170k, stops at 190k tokens
3. Context Compaction (Utility)
- File:
agent/context_manager/manager.py:312 - Library: Same
litellm.acompletion() - Purpose: Summarize middle conversation messages to fit within context window
- Model: Same as main agent
- Reasoning effort: Always "high" for quality summaries
4. Session Title Generation (Cosmetic)
- File:
backend/routes/agent.py:160-195 - Library: Same
litellm.acompletion() - Purpose: Generate 6-word session titles from first user message
- Model: Same as session model
- Prompt: "Generate a very short title (max 6 words) for a chat session that starts with this message..."
Model Routing Architecture
model_name
|
+-----v------+
| Starts with |
| "anthropic/" |
| or "openai/" |
+--+------+---+
| |
Yes| |No
| |
+-------v--+ +-v-----------+
| Direct | | HF Inference |
| API call | | Router |
| via | | via |
| LiteLLM | | OpenAI |
| | | adapter |
+----------+ +------+------+
|
router.huggingface.co/v1
|
+------v------+
| Provider |
| selection |
| (e.g. :novita|
| :cheapest) |
+-------------+
Implementation: agent/core/llm_params.py:18-76
For HF Router models, litellm's OpenAI adapter is used (model="openai/{hf_model}") with api_base pointing to the HF router. Token precedence: INFERENCE_TOKEN env > session.hf_token > HF_TOKEN env.
System Prompts: Evolution and Design
The project contains three prompt versions, showing a clear evolution:
V1 (agent/prompts/system_prompt.yaml) -- Foundation
- 171 lines
- Establishes the "Hugging Face Agent" persona
- Key principle: "Research first, then implement"
- Includes 8 worked examples covering common ML tasks
- Communication style: "concise, no emojis, no exclamation points, no flattery"
V2 (agent/prompts/system_prompt_v2.yaml) -- Expansion
- 490 lines (3x V1)
- Adds "ZERO ERRORS" success criteria
- Mandates three-phase workflow: RESEARCH -> PLAN & VALIDATE -> IMPLEMENT
- Detailed training job checklists (push_to_hub, hub_model_id, hardware sizing, timeouts)
- Task completion verification checklist
- Introduces the
researchsub-agent tool prominently
V3 (agent/prompts/system_prompt_v3.yaml) -- Active, Refined
- 166 lines (compressed back down, but more prescriptive)
- Literature-first approach (lines 12-24):
"Find the landmark paper(s), crawl citation graphs, read methodology sections (not abstracts), extract the recipe."
- Explicit anti-hallucination section (lines 29-47) listing 8 specific failure modes
- Scope-change prohibition (line 47)
- Sandbox-first development (lines 93-96)
- Autonomous mode directives (lines 127-151): "NEVER respond with only text. Every response MUST include at least one tool call."
Key Design Decisions in the Active Prompt (V3)
Literature-First Research
The prompt instructs the agent to treat published papers as the source of truth for ML recipes, not the LLM's parametric knowledge:
"Find the landmark paper(s), crawl citation graphs, read methodology sections (not abstracts), extract the recipe. If you can't find a reliable source, tell the user -- don't guess."
This is a deliberate countermeasure against LLM hallucination in technical domains.
Explicit Failure Mode Catalog
Rather than generic "be careful" instructions, the prompt lists 8 specific mistakes with fixes:
| Mistake | Fix |
|---|---|
| Hallucinated imports | Research the actual API first |
| Wrong Trainer arguments | Read the library docs |
| Wrong dataset format | Audit the dataset before using |
| Default timeout kills jobs | Set explicit timeouts based on model size |
| Lost models (no save) | Always include push_to_hub |
| Batch failures | Test in sandbox first |
| Silent dataset substitution | Verify the exact dataset name |
| Scope-changing fixes | Don't silently change what the user asked for |
This pattern of naming the failure modes you want to prevent is more effective than generic instructions because it gives the LLM concrete patterns to match against.
Scope-Change Prohibition
"Avoid at all costs! When you hit an error, you will try 'creative' workarounds that change what the user asked for... Do not do this."
This addresses a common LLM agent failure mode: when something goes wrong, the agent "helpfully" changes the approach in a way that no longer solves the original problem.
Autonomous Mode Section
For headless/autonomous operation:
- "NEVER respond with only text. Every response MUST include at least one tool call."
- "NEVER STOP WORKING."
- "Your workflow is a loop, not a checklist."
This prevents the agent from "giving up" or entering a passive state during long-running autonomous tasks.
Research Sub-Agent Prompt (research_tool.py:43-169)
The research tool has its own detailed system prompt establishing a specialist persona:
"You are a research specialist agent. Your job is to thoroughly investigate a specific question..."
Key instructions:
- "Start with HuggingFace documentation" (tool-aware research ordering)
- "Look at real code examples" (prefer concrete over theoretical)
- "Read actual paper content, not just abstracts"
- "Check dataset compatibility" (verify data format matches intended use)
- Tool-specific usage guidance for each available tool
Guardrails and Safety Mechanisms
1. Tool Approval System
- File:
agent/core/agent_loop.py:48-62 - What: Certain tools require explicit user consent before execution
- Why: Prevents accidental GPU spending, repo modifications, infrastructure creation
- Override: YOLO mode (
/yoloor--yolo) auto-approves everything
2. Doom Loop Detection
- File:
agent/core/doom_loop.py - What: Detects 3+ identical consecutive tool calls or repeating sequences of length 2-5
- Action: Injects corrective user message: "STOP repeating this approach"
- Scope: Applied to both main agent and research sub-agent
3. Context Window Management
- File:
agent/context_manager/manager.py:265 - What: Auto-compacts conversation when approaching context limit
- Safety margin: 10,000 tokens below model limit
- Preserves: System prompt, first user message (original task), last 5 messages
- Emergency: Catches
ContextWindowExceededErrorand forces compaction
4. Truncation Recovery
- File:
agent/core/agent_loop.py:512-551 - What: When LLM response hits max output tokens (
finish_reason=length), drops any partial tool calls and injects a hint to use smaller content - Why: Partial tool call JSON is invalid and would cause errors
5. Malformed JSON Recovery
- File:
agent/core/agent_loop.py:598-641 - What: When tool arguments can't be parsed as JSON, returns a descriptive error to the LLM
- Why: Allows the LLM to self-correct rather than crashing
6. Dangling Tool Call Repair
- File:
agent/context_manager/manager.py:185 - What: Injects stub results for tool calls that were never executed (due to interruption)
- Why: LLM APIs require every tool call to have a matching result message
7. Transient Error Retries
- File:
agent/core/agent_loop.py:118-136 - What: 3 retries with [5s, 15s, 30s] delays for timeouts, rate limits, 5xx errors
- Pattern matching: String-based on error messages via
_is_transient_error()
8. Read-Before-Write Enforcement
- File:
agent/tools/sandbox_client.py:818-827,agent/tools/local_tools.py:386-389 - What: Tracks which files have been read; refuses to write/edit files that haven't been read
- Why: Prevents the LLM from blindly overwriting files it hasn't examined
9. Training Script Safety Checks
- File:
agent/utils/reliability_checks.py:4-14 - What: Scans training scripts for
from_pretrainedwithoutpush_to_hub - Why: Prevents training runs that load a model but never save the result
10. Anti-Hallucination via System Prompt
- File:
agent/prompts/system_prompt_v3.yaml:29-47 - What: Lists 8 specific failure modes the LLM will make without proper research
- Why: Concrete failure examples are more effective than generic caution
11. Cancellation with Cleanup
- File:
agent/core/agent_loop.py:210 - What: On cancel, kills sandbox processes and cancels running HF jobs
- Why: Prevents zombie jobs consuming GPU resources
12. Research Sub-Agent Token Budget
- File:
agent/tools/research_tool.py:25-27 - What: Warns at 170k tokens, hard-stops at 190k by forcing a summary with no tools
- Why: Prevents runaway research sessions that exhaust the context window
Prompting Techniques Used
1. Role-Based Persona
The system prompt establishes a specific role ("ML engineering expert on the HuggingFace ecosystem") rather than a generic assistant. This focuses the model's behavior.
2. Negative Examples (What NOT to Do)
Rather than only describing desired behavior, the prompt explicitly catalogs failure modes. This is a well-known prompting technique -- models respond well to "don't do X" instructions.
3. Workflow Prescription
The prompt prescribes a specific workflow (Research -> Plan -> Implement) rather than letting the model decide. This reduces the chance of the model skipping research and jumping straight to implementation.
4. Tool-Aware Instructions
The prompt references specific tools by name and describes when to use each one. This is possible because tools are injected via Jinja2 templating:
# system_prompt_v3.yaml (simplified)
You have {{ num_tools }} tools available:
{% for tool in tools %}
- {{ tool.name }}: {{ tool.description }}
{% endfor %}5. Injected Corrective Prompts (Doom Loop)
When the agent gets stuck, a corrective message is injected as a user message (not system). User messages typically get higher attention from the LLM than system messages mid-conversation.
6. Compaction Summary Prompt
The compaction prompt specifically asks for "the 'why' behind decisions" -- not just what happened but the reasoning. This preserves the agent's ability to make informed decisions after compaction.
7. Research-First Mandate
The literature-first approach is a form of chain-of-retrieval -- forcing the model to gather evidence before synthesizing. This significantly reduces hallucination in technical domains.
LLM Call Parameters
Main Agent
{
"model": "anthropic/claude-opus-4-6", # or HF model
"messages": [...], # full conversation
"tools": [...], # all registered tools
"stream": True, # SSE streaming
"reasoning_effort": "high", # configurable
"drop_params": True, # litellm: silently drop unsupported params
}Research Sub-Agent
{
"model": "anthropic/claude-sonnet-4-6", # cheaper model
"messages": [...], # independent context
"tools": [...], # subset of tools
"stream": False, # non-streaming
"max_tokens": 16000, # output cap
}Compaction
{
"model": same_as_main,
"messages": [{"role": "user", "content": summary_prompt}],
"tools": None, # no tools
"reasoning_effort": "high", # always high for quality
"max_tokens": compact_size, # 10% of max context
}