CodeDocs Vault

05 - LLM Usage, Prompting, and Guardrails

How LLMs Are Leveraged

The project uses LLMs in three distinct contexts:

1. Main Agent Loop (Primary)

2. Research Sub-Agent (Secondary)

3. Context Compaction (Utility)

4. Session Title Generation (Cosmetic)


Model Routing Architecture

                     model_name
                         |
                   +-----v------+
                   | Starts with |
                   | "anthropic/" |
                   | or "openai/" |
                   +--+------+---+
                      |      |
                   Yes|      |No
                      |      |
              +-------v--+ +-v-----------+
              | Direct   | | HF Inference |
              | API call | | Router       |
              | via      | | via          |
              | LiteLLM  | | OpenAI       |
              |          | | adapter      |
              +----------+ +------+------+
                                  |
                           router.huggingface.co/v1
                                  |
                           +------v------+
                           | Provider    |
                           | selection   |
                           | (e.g. :novita|
                           |  :cheapest) |
                           +-------------+

Implementation: agent/core/llm_params.py:18-76

For HF Router models, litellm's OpenAI adapter is used (model="openai/{hf_model}") with api_base pointing to the HF router. Token precedence: INFERENCE_TOKEN env > session.hf_token > HF_TOKEN env.


System Prompts: Evolution and Design

The project contains three prompt versions, showing a clear evolution:

V1 (agent/prompts/system_prompt.yaml) -- Foundation

V2 (agent/prompts/system_prompt_v2.yaml) -- Expansion

V3 (agent/prompts/system_prompt_v3.yaml) -- Active, Refined

Key Design Decisions in the Active Prompt (V3)

Literature-First Research

The prompt instructs the agent to treat published papers as the source of truth for ML recipes, not the LLM's parametric knowledge:

"Find the landmark paper(s), crawl citation graphs, read methodology sections (not abstracts), extract the recipe. If you can't find a reliable source, tell the user -- don't guess."

This is a deliberate countermeasure against LLM hallucination in technical domains.

Explicit Failure Mode Catalog

Rather than generic "be careful" instructions, the prompt lists 8 specific mistakes with fixes:

Mistake Fix
Hallucinated imports Research the actual API first
Wrong Trainer arguments Read the library docs
Wrong dataset format Audit the dataset before using
Default timeout kills jobs Set explicit timeouts based on model size
Lost models (no save) Always include push_to_hub
Batch failures Test in sandbox first
Silent dataset substitution Verify the exact dataset name
Scope-changing fixes Don't silently change what the user asked for

This pattern of naming the failure modes you want to prevent is more effective than generic instructions because it gives the LLM concrete patterns to match against.

Scope-Change Prohibition

"Avoid at all costs! When you hit an error, you will try 'creative' workarounds that change what the user asked for... Do not do this."

This addresses a common LLM agent failure mode: when something goes wrong, the agent "helpfully" changes the approach in a way that no longer solves the original problem.

Autonomous Mode Section

For headless/autonomous operation:

This prevents the agent from "giving up" or entering a passive state during long-running autonomous tasks.


Research Sub-Agent Prompt (research_tool.py:43-169)

The research tool has its own detailed system prompt establishing a specialist persona:

"You are a research specialist agent. Your job is to thoroughly investigate a specific question..."

Key instructions:


Guardrails and Safety Mechanisms

1. Tool Approval System

2. Doom Loop Detection

3. Context Window Management

4. Truncation Recovery

5. Malformed JSON Recovery

6. Dangling Tool Call Repair

7. Transient Error Retries

8. Read-Before-Write Enforcement

9. Training Script Safety Checks

10. Anti-Hallucination via System Prompt

11. Cancellation with Cleanup

12. Research Sub-Agent Token Budget


Prompting Techniques Used

1. Role-Based Persona

The system prompt establishes a specific role ("ML engineering expert on the HuggingFace ecosystem") rather than a generic assistant. This focuses the model's behavior.

2. Negative Examples (What NOT to Do)

Rather than only describing desired behavior, the prompt explicitly catalogs failure modes. This is a well-known prompting technique -- models respond well to "don't do X" instructions.

3. Workflow Prescription

The prompt prescribes a specific workflow (Research -> Plan -> Implement) rather than letting the model decide. This reduces the chance of the model skipping research and jumping straight to implementation.

4. Tool-Aware Instructions

The prompt references specific tools by name and describes when to use each one. This is possible because tools are injected via Jinja2 templating:

# system_prompt_v3.yaml (simplified)
You have {{ num_tools }} tools available:
{% for tool in tools %}
- {{ tool.name }}: {{ tool.description }}
{% endfor %}

5. Injected Corrective Prompts (Doom Loop)

When the agent gets stuck, a corrective message is injected as a user message (not system). User messages typically get higher attention from the LLM than system messages mid-conversation.

6. Compaction Summary Prompt

The compaction prompt specifically asks for "the 'why' behind decisions" -- not just what happened but the reasoning. This preserves the agent's ability to make informed decisions after compaction.

7. Research-First Mandate

The literature-first approach is a form of chain-of-retrieval -- forcing the model to gather evidence before synthesizing. This significantly reduces hallucination in technical domains.


LLM Call Parameters

Main Agent

{
    "model": "anthropic/claude-opus-4-6",  # or HF model
    "messages": [...],                       # full conversation
    "tools": [...],                          # all registered tools
    "stream": True,                          # SSE streaming
    "reasoning_effort": "high",              # configurable
    "drop_params": True,                     # litellm: silently drop unsupported params
}

Research Sub-Agent

{
    "model": "anthropic/claude-sonnet-4-6",  # cheaper model
    "messages": [...],                        # independent context
    "tools": [...],                           # subset of tools
    "stream": False,                          # non-streaming
    "max_tokens": 16000,                     # output cap
}

Compaction

{
    "model": same_as_main,
    "messages": [{"role": "user", "content": summary_prompt}],
    "tools": None,                            # no tools
    "reasoning_effort": "high",              # always high for quality
    "max_tokens": compact_size,              # 10% of max context
}