05 - LLM Usage, Prompting, and Guardrails

How LLMs Are Leveraged

The project uses LLMs in three distinct contexts:

1. Main Agent Loop (Primary)

File: agent/core/agent_loop.py:245-350
Library: litellm.acompletion() -- unified interface across providers
Default model: anthropic/claude-opus-4-6 (configurable)
Streaming: Yes, via SSE chunks with stream=True
Function calling: Standard OpenAI-compatible tool calling format
Reasoning effort: Defaults to "high" (agent/config.py:37)

2. Research Sub-Agent (Secondary)

File: agent/tools/research_tool.py:345
Library: Same litellm.acompletion()
Model: Downgraded -- uses anthropic/claude-sonnet-4-6 when main agent is Anthropic (research_tool.py:217-221)
Streaming: No (non-streaming for simplicity)
Function calling: Same tool format, subset of tools
Context budget: Independent window, warns at 170k, stops at 190k tokens

3. Context Compaction (Utility)

File: agent/context_manager/manager.py:312
Library: Same litellm.acompletion()
Purpose: Summarize middle conversation messages to fit within context window
Model: Same as main agent
Reasoning effort: Always "high" for quality summaries

4. Session Title Generation (Cosmetic)

File: backend/routes/agent.py:160-195
Library: Same litellm.acompletion()
Purpose: Generate 6-word session titles from first user message
Model: Same as session model
Prompt: "Generate a very short title (max 6 words) for a chat session that starts with this message..."

Model Routing Architecture

                     model_name
                         |
                   +-----v------+
                   | Starts with |
                   | "anthropic/" |
                   | or "openai/" |
                   +--+------+---+
                      |      |
                   Yes|      |No
                      |      |
              +-------v--+ +-v-----------+
              | Direct   | | HF Inference |
              | API call | | Router       |
              | via      | | via          |
              | LiteLLM  | | OpenAI       |
              |          | | adapter      |
              +----------+ +------+------+
                                  |
                           router.huggingface.co/v1
                                  |
                           +------v------+
                           | Provider    |
                           | selection   |
                           | (e.g. :novita|
                           |  :cheapest) |
                           +-------------+

Implementation: agent/core/llm_params.py:18-76

For HF Router models, litellm's OpenAI adapter is used (model="openai/{hf_model}") with api_base pointing to the HF router. Token precedence: INFERENCE_TOKEN env > session.hf_token > HF_TOKEN env.

System Prompts: Evolution and Design

The project contains three prompt versions, showing a clear evolution:

V1 (`agent/prompts/system_prompt.yaml`) -- Foundation

171 lines
Establishes the "Hugging Face Agent" persona
Key principle: "Research first, then implement"
Includes 8 worked examples covering common ML tasks
Communication style: "concise, no emojis, no exclamation points, no flattery"

V2 (`agent/prompts/system_prompt_v2.yaml`) -- Expansion

490 lines (3x V1)
Adds "ZERO ERRORS" success criteria
Mandates three-phase workflow: RESEARCH -> PLAN & VALIDATE -> IMPLEMENT
Detailed training job checklists (push_to_hub, hub_model_id, hardware sizing, timeouts)
Task completion verification checklist
Introduces the research sub-agent tool prominently

V3 (`agent/prompts/system_prompt_v3.yaml`) -- Active, Refined

166 lines (compressed back down, but more prescriptive)
Literature-first approach (lines 12-24):

"Find the landmark paper(s), crawl citation graphs, read methodology sections (not abstracts), extract the recipe."
Explicit anti-hallucination section (lines 29-47) listing 8 specific failure modes
Scope-change prohibition (line 47)
Sandbox-first development (lines 93-96)
Autonomous mode directives (lines 127-151): "NEVER respond with only text. Every response MUST include at least one tool call."

Key Design Decisions in the Active Prompt (V3)

Literature-First Research

The prompt instructs the agent to treat published papers as the source of truth for ML recipes, not the LLM's parametric knowledge:

"Find the landmark paper(s), crawl citation graphs, read methodology sections (not abstracts), extract the recipe. If you can't find a reliable source, tell the user -- don't guess."

This is a deliberate countermeasure against LLM hallucination in technical domains.

Explicit Failure Mode Catalog

Rather than generic "be careful" instructions, the prompt lists 8 specific mistakes with fixes:

Mistake	Fix
Hallucinated imports	Research the actual API first
Wrong Trainer arguments	Read the library docs
Wrong dataset format	Audit the dataset before using
Default timeout kills jobs	Set explicit timeouts based on model size
Lost models (no save)	Always include `push_to_hub`
Batch failures	Test in sandbox first
Silent dataset substitution	Verify the exact dataset name
Scope-changing fixes	Don't silently change what the user asked for

This pattern of naming the failure modes you want to prevent is more effective than generic instructions because it gives the LLM concrete patterns to match against.

Scope-Change Prohibition

"Avoid at all costs! When you hit an error, you will try 'creative' workarounds that change what the user asked for... Do not do this."

This addresses a common LLM agent failure mode: when something goes wrong, the agent "helpfully" changes the approach in a way that no longer solves the original problem.

Autonomous Mode Section

For headless/autonomous operation:

"NEVER respond with only text. Every response MUST include at least one tool call."
"NEVER STOP WORKING."
"Your workflow is a loop, not a checklist."

This prevents the agent from "giving up" or entering a passive state during long-running autonomous tasks.

Research Sub-Agent Prompt (`research_tool.py:43-169`)

The research tool has its own detailed system prompt establishing a specialist persona:

"You are a research specialist agent. Your job is to thoroughly investigate a specific question..."

Key instructions:

"Start with HuggingFace documentation" (tool-aware research ordering)
"Look at real code examples" (prefer concrete over theoretical)
"Read actual paper content, not just abstracts"
"Check dataset compatibility" (verify data format matches intended use)
Tool-specific usage guidance for each available tool

Guardrails and Safety Mechanisms

1. Tool Approval System

File: agent/core/agent_loop.py:48-62
What: Certain tools require explicit user consent before execution
Why: Prevents accidental GPU spending, repo modifications, infrastructure creation
Override: YOLO mode (/yolo or --yolo) auto-approves everything

2. Doom Loop Detection

File: agent/core/doom_loop.py
What: Detects 3+ identical consecutive tool calls or repeating sequences of length 2-5
Action: Injects corrective user message: "STOP repeating this approach"
Scope: Applied to both main agent and research sub-agent

3. Context Window Management

File: agent/context_manager/manager.py:265
What: Auto-compacts conversation when approaching context limit
Safety margin: 10,000 tokens below model limit
Preserves: System prompt, first user message (original task), last 5 messages
Emergency: Catches ContextWindowExceededError and forces compaction

4. Truncation Recovery

File: agent/core/agent_loop.py:512-551
What: When LLM response hits max output tokens (finish_reason=length), drops any partial tool calls and injects a hint to use smaller content
Why: Partial tool call JSON is invalid and would cause errors

5. Malformed JSON Recovery

File: agent/core/agent_loop.py:598-641
What: When tool arguments can't be parsed as JSON, returns a descriptive error to the LLM
Why: Allows the LLM to self-correct rather than crashing

6. Dangling Tool Call Repair

File: agent/context_manager/manager.py:185
What: Injects stub results for tool calls that were never executed (due to interruption)
Why: LLM APIs require every tool call to have a matching result message

7. Transient Error Retries

File: agent/core/agent_loop.py:118-136
What: 3 retries with [5s, 15s, 30s] delays for timeouts, rate limits, 5xx errors
Pattern matching: String-based on error messages via _is_transient_error()

8. Read-Before-Write Enforcement

File: agent/tools/sandbox_client.py:818-827, agent/tools/local_tools.py:386-389
What: Tracks which files have been read; refuses to write/edit files that haven't been read
Why: Prevents the LLM from blindly overwriting files it hasn't examined

9. Training Script Safety Checks

File: agent/utils/reliability_checks.py:4-14
What: Scans training scripts for from_pretrained without push_to_hub
Why: Prevents training runs that load a model but never save the result

10. Anti-Hallucination via System Prompt

File: agent/prompts/system_prompt_v3.yaml:29-47
What: Lists 8 specific failure modes the LLM will make without proper research
Why: Concrete failure examples are more effective than generic caution

11. Cancellation with Cleanup

File: agent/core/agent_loop.py:210
What: On cancel, kills sandbox processes and cancels running HF jobs
Why: Prevents zombie jobs consuming GPU resources

12. Research Sub-Agent Token Budget

File: agent/tools/research_tool.py:25-27
What: Warns at 170k tokens, hard-stops at 190k by forcing a summary with no tools
Why: Prevents runaway research sessions that exhaust the context window

Prompting Techniques Used

1. Role-Based Persona

The system prompt establishes a specific role ("ML engineering expert on the HuggingFace ecosystem") rather than a generic assistant. This focuses the model's behavior.

2. Negative Examples (What NOT to Do)

Rather than only describing desired behavior, the prompt explicitly catalogs failure modes. This is a well-known prompting technique -- models respond well to "don't do X" instructions.

3. Workflow Prescription

The prompt prescribes a specific workflow (Research -> Plan -> Implement) rather than letting the model decide. This reduces the chance of the model skipping research and jumping straight to implementation.

4. Tool-Aware Instructions

The prompt references specific tools by name and describes when to use each one. This is possible because tools are injected via Jinja2 templating:

# system_prompt_v3.yaml (simplified)
You have {{ num_tools }} tools available:
{% for tool in tools %}
- {{ tool.name }}: {{ tool.description }}
{% endfor %}

{
    "model": "anthropic/claude-opus-4-6",  # or HF model
    "messages": [...],                       # full conversation
    "tools": [...],                          # all registered tools
    "stream": True,                          # SSE streaming
    "reasoning_effort": "high",              # configurable
    "drop_params": True,                     # litellm: silently drop unsupported params
}

Research Sub-Agent

{
    "model": "anthropic/claude-sonnet-4-6",  # cheaper model
    "messages": [...],                        # independent context
    "tools": [...],                           # subset of tools
    "stream": False,                          # non-streaming
    "max_tokens": 16000,                     # output cap
}

Compaction

{
    "model": same_as_main,
    "messages": [{"role": "user", "content": summary_prompt}],
    "tools": None,                            # no tools
    "reasoning_effort": "high",              # always high for quality
    "max_tokens": compact_size,              # 10% of max context
}