CodeDocs Vault

7. LLM Integration Deep Dive

How LLMs Are Leveraged

OpenHands uses LLMs as the reasoning engine in an agentic control loop. The LLM doesn't just answer questions -- it plans, reasons, selects tools, executes code, and self-corrects. Here's how this works at every level.

Architecture Overview

┌─────────────────────────────────────────────────────┐
│                   CodeActAgent                       │
│                                                     │
│  1. Build conversation from event history            │
│  2. Apply condensation if needed                     │
│  3. Format messages for specific model               │
│  4. Call LLM with tools                              │
│  5. Parse response into Actions                      │
│  6. Queue multiple actions if multi-tool response    │
│                                                     │
│  ┌─────────────────────────────────────────────────┐ │
│  │              LLM Wrapper (llm.py)               │ │
│  │                                                 │ │
│  │  ┌──────────┐ ┌──────────┐ ┌───────────────┐  │ │
│  │  │ RetryMix │ │ DebugMix │ │ ModelFeatures │  │ │
│  │  │ (backoff)│ │ (logging)│ │ (capabilities)│  │ │
│  │  └──────────┘ └──────────┘ └───────────────┘  │ │
│  │                                                 │ │
│  │  ┌──────────────────────────────────────────┐  │ │
│  │  │           FnCallConverter               │  │ │
│  │  │  (native ↔ text function calling)       │  │ │
│  │  └──────────────────────────────────────────┘  │ │
│  │                                                 │ │
│  │  ┌──────────────────────────────────────────┐  │ │
│  │  │           LiteLLM                       │  │ │
│  │  │  (100+ provider abstraction)            │  │ │
│  │  └──────────────────────────────────────────┘  │ │
│  │                                                 │ │
│  │  ┌──────────────────────────────────────────┐  │ │
│  │  │           Metrics                       │  │ │
│  │  │  (cost, tokens, latency, cache)         │  │ │
│  │  └──────────────────────────────────────────┘  │ │
│  └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

1. Multi-Provider Support

Provider Abstraction via LiteLLM

File: openhands/llm/llm.py:221-233

All LLM calls go through litellm.completion(), which provides a unified interface for:

Provider Models Config Key
Anthropic Claude 3.x, 4.x (Opus, Sonnet, Haiku) anthropic/claude-...
OpenAI GPT-4o, GPT-4.1, GPT-5, o1/o3/o4 gpt-4o, o3-...
Google Gemini 2.5, 3.x gemini/gemini-...
AWS Bedrock Claude, Titan bedrock/anthropic.claude-...
Azure OpenAI GPT variants azure/gpt-...
DeepSeek DeepSeek Chat, R1 deepseek/deepseek-...
Groq Llama, Mixtral groq/llama-...
Mistral Mistral, Codestral mistral/...
OpenRouter Any model openrouter/...
Local (Ollama) Any GGUF model ollama/...

Model-Specific Parameter Handling

File: openhands/llm/llm.py:143-214

# Claude models cannot accept both temperature and top_p
# Lines 204-214
if 'claude' in model_name:
    if temperature is not None:
        top_p = None  # temperature takes precedence
 
# Reasoning effort mapping
# Lines 143-175
# Gemini 2.5-pro: 'low'→'none', 'medium'→'none', 'high'→'high'
# Claude Sonnet 4.5: maps to budgets
# Claude Opus 4.6: maps to budgets

Custom Model Name Rewriting

File: openhands/llm/llm.py:135-141

# Models prefixed with 'openhands/' are rewritten to litellm_proxy
# with a custom base URL pointing to the OpenHands LLM proxy
if model.startswith('openhands/'):
    model = model.replace('openhands/', 'litellm_proxy/')
    base_url = f'https://llm.openhands.ai'

2. Function Calling (Tool Use)

Native Function Calling

File: openhands/llm/model_features.py:69-104

Models with native function calling support use the standard OpenAI-compatible tools parameter:

FUNCTION_CALLING_SUPPORT_PATTERNS = [
    'claude-3.7-sonnet*', 'claude-3.5-sonnet*', 'claude-3.5-haiku*',
    'claude-sonnet-4*', 'claude-opus-4*',
    'gpt-4o*', 'gpt-4.1*', 'gpt-5*',
    'o1*', 'o3*', 'o4*',
    'gemini-2.5-pro*', 'gemini-3*',
    'groq/*',
    'kimi-k2*', 'qwen3-coder*', 'deepseek-chat*',
    ...
]

Mock Function Calling (The Adapter)

File: openhands/llm/fn_call_converter.py (979 lines)

For models without native function calling, OpenHands injects a custom format into the system prompt:

<!-- Injected system prompt suffix (lines 36-60) -->
You have access to a set of functions. To call a function:
<function=example_function_name>
<parameter=example_parameter_1>value_1</parameter>
<parameter=example_parameter_2>value_2</parameter>
</function>
 
IMPORTANT:
- ONLY call ONE function at a time
- NEVER use placeholders
- Required parameters MUST be specified
- Put reasoning BEFORE function calls, not after

Conversion pipeline:

Agent has tools defined as OpenAI-format schemas
    │
    ├── If model supports native function calling:
    │   └── Pass tools directly to litellm.completion()
    │
    └── If model does NOT support native function calling:
        │
        ├── Forward conversion (fn_call → text):
        │   1. Append tool descriptions to system prompt
        │   2. Inject in-context learning examples in first user message
        │   3. Convert assistant tool_calls to XML format in content
        │   4. Convert tool messages to user messages with
        │      "EXECUTION RESULT of [tool_name]:" prefix
        │   5. Add stop words: ['</function']
        │
        ├── Call LLM (no tools param, just text)
        │
        └── Reverse conversion (text → fn_call):
            1. Parse response with FN_REGEX_PATTERN
            2. Extract function name and parameters
            3. Validate against tool schemas
            4. Type-convert parameters (int, array, string)
            5. Check required params and enum values
            6. Construct tool_call object with ID: toolu_{counter:02d}

In-Context Learning Examples

File: openhands/llm/fn_call_converter.py:326-392

The system dynamically generates examples based on available tools:

def get_example_for_tools(tools):
    """Generate example tool usage based on which tools are enabled."""
    # Only includes examples for tools that are actually available
    # Example flow: bash → create file → edit file → run server → check browser
    # Wrapped with START OF EXAMPLE / END OF EXAMPLE delimiters

Robustness Fixes

File: openhands/llm/fn_call_converter.py:701-727

def _fix_stopword(content):
    """Fix incomplete function calls (missing </function>)."""
    # If we find <function= without matching </function>,
    # append </function> to complete it
 
def _normalize_parameter_tags(content):
    """Fix malformed parameter tags."""
    # <parameter=command=str_replace> → <parameter=command>str_replace</parameter>
    # Handles LLM formatting errors in XML-like syntax

3. Prompt Engineering Techniques

Multi-Layer Prompt Architecture

The system prompt is composed from multiple Jinja2 templates at runtime:

System Message
├── Base prompt (system_prompt.j2)
│   ├── ROLE: "You are OpenHands agent..."
│   ├── EFFICIENCY: Combine operations
│   ├── FILE_SYSTEM_GUIDELINES: 7 specific rules
│   ├── CODE_QUALITY: 5 practices
│   ├── VERSION_CONTROL: Git safety rules
│   ├── PULL_REQUESTS: One PR per session
│   ├── PROBLEM_SOLVING_WORKFLOW: 5-step process
│   ├── SECURITY: Credential handling
│   ├── SECURITY_RISK_ASSESSMENT: ← included from template
│   ├── EXTERNAL_SERVICES: API-first approach
│   ├── ENVIRONMENT_SETUP: Install missing deps
│   ├── TROUBLESHOOTING: 5-7 source reflection
│   ├── DOCUMENTATION: In-conversation preferred
│   └── PROCESS_MANAGEMENT: Targeted process killing
│
├── Variant overlay (optional, selected via config)
│   ├── system_prompt_interactive.j2: + INTERACTION_RULES
│   ├── system_prompt_long_horizon.j2: + TASK_MANAGEMENT
│   └── system_prompt_tech_philosophy.j2: + TECHNICAL_PHILOSOPHY
│
├── Additional info (additional_info.j2)
│   ├── REPOSITORY_INFO: repo name, dir, branch
│   ├── REPOSITORY_INSTRUCTIONS: from .openhands/microagents/repo.md
│   ├── RUNTIME_INFORMATION: working dir, hosts, date
│   └── CONVERSATION_INSTRUCTIONS: per-conversation rules
│
└── Microagent knowledge (microagent_info.j2)
    └── Triggered skills/knowledge based on keyword matching

Security Risk Self-Assessment

File: openhands/agenthub/codeact_agent/prompts/security_risk_assessment.j2

Every tool requiring execution has a mandatory security_risk parameter:

# CLI Mode:
LOW:    Read-only (viewing content, reading files, calculations)
MEDIUM: Project-scoped (modify project files, run tests, install local packages)
HIGH:   System-level (system settings, global installs, sudo, delete critical files,
        download & execute untrusted code, send secrets out)

# Sandbox Mode:
LOW:    Read-only inside sandbox
MEDIUM: Container-scoped edits (modify workspace, install in container, run code)
HIGH:   Data exfiltration or privilege breaks (send secrets out, connect to host,
        privileged ops, unverified binaries with network)

The definitions change based on execution context (cli_mode flag), recognizing that the same action has different risk profiles in different environments.

Microagent Trigger System

File: openhands/microagent/microagent.py:183-212

Knowledge microagents are activated by keyword matching in user messages:

class KnowledgeMicroagent(BaseMicroagent):
    triggers: list[str]  # e.g., ["github", "git push", "pull request"]
 
    def match_trigger(self, message: str) -> bool:
        # Fuzzy matching against triggers
        # Returns True if any trigger matches the message

When triggered, the microagent's content is injected into the conversation via RecallObservation. This enables domain-specific expertise without bloating the base system prompt.

Example skills:

Skill Triggers Content
github.md github, git push, PR GitHub API usage, token handling, branch rules
code-review.md /codereview Code review persona with structured feedback
fix_test.md /fix_test Test fixing workflow (never modify tests)

Conversation Context Injection

File: openhands/agenthub/codeact_agent/prompts/additional_info.j2

Runtime context is injected dynamically:

{% if repository_info %}
<REPOSITORY_INFO>
You are working in the repository: {{ repo_name }}
Located at: {{ repo_directory }}
Current branch: {{ repo_branch }}
</REPOSITORY_INFO>
{% endif %}
 
{% if runtime_info %}
<RUNTIME_INFORMATION>
Working directory: {{ working_dir }}
Available hosts: {{ hosts }}  {# for accessing web apps in sandbox #}
Current date: {{ current_date }}
</RUNTIME_INFORMATION>
{% endif %}

4. Guardrails and Safety

Layer 1: Prompt-Level Guardrails

The system prompt contains explicit behavioral constraints:

FILE_SYSTEM_GUIDELINES:
- Don't create multiple file versions (e.g., file_test.py, file_fix.py)
- Edit files directly, don't create new versions
- Delete temporary files after confirming solution

VERSION_CONTROL:
- Don't push to main, delete repos without explicit request
- Don't commit node_modules, .env, build dirs

SECURITY:
- Only use credentials as explicitly requested
- Use APIs instead of browser for platform interactions

PROCESS_MANAGEMENT:
- Don't use generic pkill (e.g., pkill -f server)
- Find specific PID first, then kill that PID

Layer 2: Tool-Level Guardrails

Every modifying tool requires a security_risk parameter:

# From tools/bash.py
{
    "name": "execute_bash",
    "parameters": {
        "command": {"type": "string", "required": True},
        "security_risk": {
            "type": "string",
            "enum": ["LOW", "MEDIUM", "HIGH"],
            "required": True,
            "description": "The LLM's assessment of the safety risk..."
        }
    }
}

Layer 3: Controller-Level Guardrails

File: openhands/controller/agent_controller.py:978-1038

# Security analysis pipeline
action_risk = await security_analyzer.security_risk(action)
action.security_risk = action_risk
 
# Confirmation mode: block HIGH risk actions
if confirmation_mode and action_risk in [ActionSecurityRisk.HIGH, ActionSecurityRisk.UNKNOWN]:
    action.confirmation_state = ActionConfirmationStatus.AWAITING_CONFIRMATION
    await self.set_agent_state_to(AgentState.AWAITING_USER_CONFIRMATION)
    # User must explicitly approve before execution

Layer 4: Runtime-Level Guardrails

Layer 5: Budget & Iteration Limits

File: openhands/controller/state/control_flags.py

IterationControlFlag:
    max_value: int  # default from OH_MAX_ITERATIONS env var
    # Raises after max iterations reached
 
BudgetControlFlag:
    max_value: float  # from config.max_budget_per_task
    # Raises after accumulated LLM cost exceeds budget

Layer 6: Stuck Detection

File: openhands/controller/stuck.py

Five loop detection heuristics prevent infinite loops:

Pattern Threshold Action
Same action-observation pair repeated 4x Error/Recovery
Same action causing same error 3x Error/Recovery
Agent monologue (talking to itself) Variable Error/Recovery
6-step repeating pattern 6 events Error/Recovery
Repeated context window errors 10 events Error/Recovery

Layer 7: Input Validation

File: openhands/agenthub/readonly_agent/function_calling.py:51-103

# Shell injection prevention for search operations
pattern_escaped = shlex.quote(pattern)
path_escaped = shlex.quote(path)
include_escaped = shlex.quote(include_glob)
 
# Constructs safe ripgrep command
cmd = f"rg {pattern_escaped} {path_escaped} --glob {include_escaped}"

File: openhands/agenthub/browsing_agent/utils.py:15-29

# Safe YAML parsing
parsed = yaml.safe_load(content)  # Prevents arbitrary code execution

5. Token Management & Cost Optimization

Prompt Caching

File: openhands/llm/llm.py:597-612

def is_caching_prompt_active(self):
    return self.config.caching_prompt and self._supports_caching()
 
# Applied in CodeActAgent.step():
if self.llm.is_caching_prompt_active():
    messages[-1].cache_enabled = True
    # Adds cache_control: {'type': 'ephemeral'} to content

Supported models (from model_features.py:132-149):

Claude 3.x, 3.5.x, 3.7.x
Claude Sonnet 4.x, Opus 4.x
Gemini 3.1-pro
Kimi K2.5
GLM 4/5

Cache metrics tracking (metrics.py):

# Tracks cache effectiveness
cache_read_tokens   # Tokens served from cache (cost savings)
cache_write_tokens  # Tokens written to cache (one-time cost)

Memory Condensation

When context grows too large, the condenser compresses old events:

File: openhands/memory/condenser/impl/llm_summarizing_condenser.py

Condensation flow:
1. Check: len(view) > max_size OR unhandled condensation request
2. Keep first N events (keep_first) as anchor
3. Target compression: max_size / 2
4. Identify events to forget (not in head or tail)
5. Call LLM to summarize forgotten events
6. Replace forgotten events with summary
7. Critical: PRESERVE task tracker IDs and statuses

Summarization prompt template:

USER_CONTEXT: (requirements, goals)
TASK_TRACKING: {Active tasks, IDs, statuses - MUST PRESERVE}
COMPLETED: (done tasks with results)
PENDING: (remaining tasks)
CURRENT_STATE: (variables, state)
CODE_STATE: {files, functions, structures}
TESTS: {failing cases, errors}
VERSION_CONTROL_STATUS: {branch, PR, commits}

Observation Masking

File: openhands/memory/condenser/impl/observation_masking_condenser.py

Old observations outside the attention window are replaced with <MASKED>, preserving the event structure but reducing token count.

Token Counting

File: openhands/llm/llm.py:699-747

def get_token_count(self, messages):
    # Uses litellm.token_counter() with optional custom tokenizer
    # Falls back to 0 on errors (graceful degradation)
    return litellm.token_counter(
        model=self.model_name,
        messages=messages,
        custom_tokenizer=self.config.custom_tokenizer,
    )

6. Retry & Resilience Patterns

Exponential Backoff

File: openhands/llm/retry_mixin.py:24-76

@tenacity.retry(
    wait=wait_exponential(
        multiplier=config.retry_multiplier,  # 8x
        min=config.retry_min_wait,           # 8s
        max=config.retry_max_wait,           # 64s
    ),
    stop=stop_after_attempt(config.num_retries),  # 5
    retry=retry_if_exception_type(LLM_RETRY_EXCEPTIONS),
    reraise=True,
)

Retry-eligible exceptions:

LLM_RETRY_EXCEPTIONS = (
    APIConnectionError,
    RateLimitError,
    ServiceUnavailableError,
    BadGatewayError,
    litellm.Timeout,
    litellm.InternalServerError,
    LLMNoResponseError,
)

Temperature Perturbation

File: openhands/llm/retry_mixin.py:46-60

# When LLM returns empty response with temperature=0:
if isinstance(exception, LLMNoResponseError):
    if completion_fn.keywords.get('temperature', None) == 0:
        completion_fn.keywords['temperature'] = 1.0
        # "Adding randomness to avoid repeated empty responses"

This is a clever trick: deterministic temperature (0) can cause the model to consistently return empty responses. Temporarily increasing temperature breaks out of this local minimum.

Context Window Recovery

File: openhands/controller/agent_controller.py:345-355

# When context window is exceeded:
if isinstance(exception, ContextWindowExceededError):
    if condensation_enabled:
        # Trigger condensation to compress context
        trigger_condensation_request()
    else:
        raise LLMContextWindowExceedError()

7. Response Processing Pipeline

LLM Response → Actions

File: openhands/agenthub/codeact_agent/function_calling.py:79-344

def response_to_actions(response, mcp_tool_names):
    actions = []
    for tool_call in response.choices[0].message.tool_calls:
        name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)
 
        if name == 'execute_bash':
            action = CmdRunAction(command=args['command'])
        elif name == 'str_replace_editor':
            if args['command'] == 'view':
                action = FileReadAction(path=args['path'])
            else:
                action = FileEditAction(...)
        elif name == 'browser':
            action = BrowseInteractiveAction(browser_actions=args['code'])
        elif name == 'finish':
            action = AgentFinishAction(message=args['message'])
        elif name == 'think':
            action = AgentThinkAction(thought=args['thought'])
        elif name in mcp_tool_names:
            action = MCPAction(name=name, arguments=args)
        # ... etc
 
        set_security_risk(action, args)  # Extract LLM's risk assessment
        action.tool_call_metadata = ToolCallMetadata(
            tool_call_id=tool_call.id,
            function_name=name,
            model_response=response,
        )
        actions.append(action)
 
    return actions

Multi-Tool Response Handling

When the LLM returns multiple tool calls in one response:

# In CodeActAgent.step():
actions = response_to_actions(response, self.mcp_tool_names)
self.pending_actions.extend(actions[1:])  # Queue all but first
return actions[0]  # Return first immediately
 
# On next step() call:
if self.pending_actions:
    return self.pending_actions.popleft()  # Return queued without LLM call

This amortizes LLM latency: one LLM call can produce multiple sequential actions.

8. Metrics & Observability

File: openhands/llm/metrics.py

Every LLM call tracks:

class TokenUsage:
    prompt_tokens: int          # Input tokens
    completion_tokens: int      # Output tokens
    cache_read_tokens: int      # Cache hits (cost savings)
    cache_write_tokens: int     # Cache writes (one-time cost)
    context_window: int         # Model's max context
    per_turn_token: int         # Total tokens this turn
    response_id: str            # Unique response ID
 
class ResponseLatency:
    latency: float              # Round-trip time (seconds)
    response_id: str            # Links to token usage
    model: str                  # Model used
 
class Cost:
    cost: float                 # Dollar cost
    timestamp: str              # When incurred

Cost calculation (llm.py:764-822):

  1. Check response headers for cost (some providers include it)
  2. Fall back to litellm.completion_cost()
  3. Fall back to custom input_cost_per_token / output_cost_per_token config
  4. If all fail, log warning and disable future cost tracking

9. Vision & Multimodal Support

File: openhands/llm/llm.py:561-594

def vision_is_active(self):
    return not self.config.disable_vision and self._supports_vision()
 
def _supports_vision(self):
    # Check litellm's supports_vision() on model name
    # Check model_info lookup
    # Environment override: OPENHANDS_FORCE_VISION=1

When vision is active:

10. MCP (Model Context Protocol) Integration

File: openhands/mcp/client.py:20-178

MCP extends the agent's tool set with external servers:

class MCPClient:
    async def connect_http(self, url, headers=None):
        # Connect to HTTP/SSE MCP server
        # Discover available tools
        # Add to agent's tool list
 
    async def connect_stdio(self, command, args, env=None):
        # Launch MCP server as subprocess
        # Connect via stdio transport
        # Discover and register tools
 
    async def call_tool(self, name, arguments):
        # Execute MCP tool with timeout
        # Return CallToolResult

Default MCP servers (from config):

MCP actions flow through the same event system:

MCPAction → EventStream → Runtime → MCP Proxy → MCP Server → MCPObservation