7. LLM Integration Deep Dive
How LLMs Are Leveraged
OpenHands uses LLMs as the reasoning engine in an agentic control loop. The LLM doesn't just answer questions -- it plans, reasons, selects tools, executes code, and self-corrects. Here's how this works at every level.
Architecture Overview
┌─────────────────────────────────────────────────────┐
│ CodeActAgent │
│ │
│ 1. Build conversation from event history │
│ 2. Apply condensation if needed │
│ 3. Format messages for specific model │
│ 4. Call LLM with tools │
│ 5. Parse response into Actions │
│ 6. Queue multiple actions if multi-tool response │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ LLM Wrapper (llm.py) │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │ │
│ │ │ RetryMix │ │ DebugMix │ │ ModelFeatures │ │ │
│ │ │ (backoff)│ │ (logging)│ │ (capabilities)│ │ │
│ │ └──────────┘ └──────────┘ └───────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ FnCallConverter │ │ │
│ │ │ (native ↔ text function calling) │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ LiteLLM │ │ │
│ │ │ (100+ provider abstraction) │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ Metrics │ │ │
│ │ │ (cost, tokens, latency, cache) │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
1. Multi-Provider Support
Provider Abstraction via LiteLLM
File: openhands/llm/llm.py:221-233
All LLM calls go through litellm.completion(), which provides a unified interface for:
| Provider | Models | Config Key |
|---|---|---|
| Anthropic | Claude 3.x, 4.x (Opus, Sonnet, Haiku) | anthropic/claude-... |
| OpenAI | GPT-4o, GPT-4.1, GPT-5, o1/o3/o4 | gpt-4o, o3-... |
| Gemini 2.5, 3.x | gemini/gemini-... |
|
| AWS Bedrock | Claude, Titan | bedrock/anthropic.claude-... |
| Azure OpenAI | GPT variants | azure/gpt-... |
| DeepSeek | DeepSeek Chat, R1 | deepseek/deepseek-... |
| Groq | Llama, Mixtral | groq/llama-... |
| Mistral | Mistral, Codestral | mistral/... |
| OpenRouter | Any model | openrouter/... |
| Local (Ollama) | Any GGUF model | ollama/... |
Model-Specific Parameter Handling
File: openhands/llm/llm.py:143-214
# Claude models cannot accept both temperature and top_p
# Lines 204-214
if 'claude' in model_name:
if temperature is not None:
top_p = None # temperature takes precedence
# Reasoning effort mapping
# Lines 143-175
# Gemini 2.5-pro: 'low'→'none', 'medium'→'none', 'high'→'high'
# Claude Sonnet 4.5: maps to budgets
# Claude Opus 4.6: maps to budgetsCustom Model Name Rewriting
File: openhands/llm/llm.py:135-141
# Models prefixed with 'openhands/' are rewritten to litellm_proxy
# with a custom base URL pointing to the OpenHands LLM proxy
if model.startswith('openhands/'):
model = model.replace('openhands/', 'litellm_proxy/')
base_url = f'https://llm.openhands.ai'2. Function Calling (Tool Use)
Native Function Calling
File: openhands/llm/model_features.py:69-104
Models with native function calling support use the standard OpenAI-compatible tools parameter:
FUNCTION_CALLING_SUPPORT_PATTERNS = [
'claude-3.7-sonnet*', 'claude-3.5-sonnet*', 'claude-3.5-haiku*',
'claude-sonnet-4*', 'claude-opus-4*',
'gpt-4o*', 'gpt-4.1*', 'gpt-5*',
'o1*', 'o3*', 'o4*',
'gemini-2.5-pro*', 'gemini-3*',
'groq/*',
'kimi-k2*', 'qwen3-coder*', 'deepseek-chat*',
...
]Mock Function Calling (The Adapter)
File: openhands/llm/fn_call_converter.py (979 lines)
For models without native function calling, OpenHands injects a custom format into the system prompt:
<!-- Injected system prompt suffix (lines 36-60) -->
You have access to a set of functions. To call a function:
<function=example_function_name>
<parameter=example_parameter_1>value_1</parameter>
<parameter=example_parameter_2>value_2</parameter>
</function>
IMPORTANT:
- ONLY call ONE function at a time
- NEVER use placeholders
- Required parameters MUST be specified
- Put reasoning BEFORE function calls, not afterConversion pipeline:
Agent has tools defined as OpenAI-format schemas
│
├── If model supports native function calling:
│ └── Pass tools directly to litellm.completion()
│
└── If model does NOT support native function calling:
│
├── Forward conversion (fn_call → text):
│ 1. Append tool descriptions to system prompt
│ 2. Inject in-context learning examples in first user message
│ 3. Convert assistant tool_calls to XML format in content
│ 4. Convert tool messages to user messages with
│ "EXECUTION RESULT of [tool_name]:" prefix
│ 5. Add stop words: ['</function']
│
├── Call LLM (no tools param, just text)
│
└── Reverse conversion (text → fn_call):
1. Parse response with FN_REGEX_PATTERN
2. Extract function name and parameters
3. Validate against tool schemas
4. Type-convert parameters (int, array, string)
5. Check required params and enum values
6. Construct tool_call object with ID: toolu_{counter:02d}
In-Context Learning Examples
File: openhands/llm/fn_call_converter.py:326-392
The system dynamically generates examples based on available tools:
def get_example_for_tools(tools):
"""Generate example tool usage based on which tools are enabled."""
# Only includes examples for tools that are actually available
# Example flow: bash → create file → edit file → run server → check browser
# Wrapped with START OF EXAMPLE / END OF EXAMPLE delimitersRobustness Fixes
File: openhands/llm/fn_call_converter.py:701-727
def _fix_stopword(content):
"""Fix incomplete function calls (missing </function>)."""
# If we find <function= without matching </function>,
# append </function> to complete it
def _normalize_parameter_tags(content):
"""Fix malformed parameter tags."""
# <parameter=command=str_replace> → <parameter=command>str_replace</parameter>
# Handles LLM formatting errors in XML-like syntax3. Prompt Engineering Techniques
Multi-Layer Prompt Architecture
The system prompt is composed from multiple Jinja2 templates at runtime:
System Message
├── Base prompt (system_prompt.j2)
│ ├── ROLE: "You are OpenHands agent..."
│ ├── EFFICIENCY: Combine operations
│ ├── FILE_SYSTEM_GUIDELINES: 7 specific rules
│ ├── CODE_QUALITY: 5 practices
│ ├── VERSION_CONTROL: Git safety rules
│ ├── PULL_REQUESTS: One PR per session
│ ├── PROBLEM_SOLVING_WORKFLOW: 5-step process
│ ├── SECURITY: Credential handling
│ ├── SECURITY_RISK_ASSESSMENT: ← included from template
│ ├── EXTERNAL_SERVICES: API-first approach
│ ├── ENVIRONMENT_SETUP: Install missing deps
│ ├── TROUBLESHOOTING: 5-7 source reflection
│ ├── DOCUMENTATION: In-conversation preferred
│ └── PROCESS_MANAGEMENT: Targeted process killing
│
├── Variant overlay (optional, selected via config)
│ ├── system_prompt_interactive.j2: + INTERACTION_RULES
│ ├── system_prompt_long_horizon.j2: + TASK_MANAGEMENT
│ └── system_prompt_tech_philosophy.j2: + TECHNICAL_PHILOSOPHY
│
├── Additional info (additional_info.j2)
│ ├── REPOSITORY_INFO: repo name, dir, branch
│ ├── REPOSITORY_INSTRUCTIONS: from .openhands/microagents/repo.md
│ ├── RUNTIME_INFORMATION: working dir, hosts, date
│ └── CONVERSATION_INSTRUCTIONS: per-conversation rules
│
└── Microagent knowledge (microagent_info.j2)
└── Triggered skills/knowledge based on keyword matching
Security Risk Self-Assessment
File: openhands/agenthub/codeact_agent/prompts/security_risk_assessment.j2
Every tool requiring execution has a mandatory security_risk parameter:
# CLI Mode:
LOW: Read-only (viewing content, reading files, calculations)
MEDIUM: Project-scoped (modify project files, run tests, install local packages)
HIGH: System-level (system settings, global installs, sudo, delete critical files,
download & execute untrusted code, send secrets out)
# Sandbox Mode:
LOW: Read-only inside sandbox
MEDIUM: Container-scoped edits (modify workspace, install in container, run code)
HIGH: Data exfiltration or privilege breaks (send secrets out, connect to host,
privileged ops, unverified binaries with network)
The definitions change based on execution context (cli_mode flag), recognizing that the same action has different risk profiles in different environments.
Microagent Trigger System
File: openhands/microagent/microagent.py:183-212
Knowledge microagents are activated by keyword matching in user messages:
class KnowledgeMicroagent(BaseMicroagent):
triggers: list[str] # e.g., ["github", "git push", "pull request"]
def match_trigger(self, message: str) -> bool:
# Fuzzy matching against triggers
# Returns True if any trigger matches the messageWhen triggered, the microagent's content is injected into the conversation via RecallObservation. This enables domain-specific expertise without bloating the base system prompt.
Example skills:
| Skill | Triggers | Content |
|---|---|---|
github.md |
github, git push, PR | GitHub API usage, token handling, branch rules |
code-review.md |
/codereview | Code review persona with structured feedback |
fix_test.md |
/fix_test | Test fixing workflow (never modify tests) |
Conversation Context Injection
File: openhands/agenthub/codeact_agent/prompts/additional_info.j2
Runtime context is injected dynamically:
{% if repository_info %}
<REPOSITORY_INFO>
You are working in the repository: {{ repo_name }}
Located at: {{ repo_directory }}
Current branch: {{ repo_branch }}
</REPOSITORY_INFO>
{% endif %}
{% if runtime_info %}
<RUNTIME_INFORMATION>
Working directory: {{ working_dir }}
Available hosts: {{ hosts }} {# for accessing web apps in sandbox #}
Current date: {{ current_date }}
</RUNTIME_INFORMATION>
{% endif %}4. Guardrails and Safety
Layer 1: Prompt-Level Guardrails
The system prompt contains explicit behavioral constraints:
FILE_SYSTEM_GUIDELINES:
- Don't create multiple file versions (e.g., file_test.py, file_fix.py)
- Edit files directly, don't create new versions
- Delete temporary files after confirming solution
VERSION_CONTROL:
- Don't push to main, delete repos without explicit request
- Don't commit node_modules, .env, build dirs
SECURITY:
- Only use credentials as explicitly requested
- Use APIs instead of browser for platform interactions
PROCESS_MANAGEMENT:
- Don't use generic pkill (e.g., pkill -f server)
- Find specific PID first, then kill that PID
Layer 2: Tool-Level Guardrails
Every modifying tool requires a security_risk parameter:
# From tools/bash.py
{
"name": "execute_bash",
"parameters": {
"command": {"type": "string", "required": True},
"security_risk": {
"type": "string",
"enum": ["LOW", "MEDIUM", "HIGH"],
"required": True,
"description": "The LLM's assessment of the safety risk..."
}
}
}Layer 3: Controller-Level Guardrails
File: openhands/controller/agent_controller.py:978-1038
# Security analysis pipeline
action_risk = await security_analyzer.security_risk(action)
action.security_risk = action_risk
# Confirmation mode: block HIGH risk actions
if confirmation_mode and action_risk in [ActionSecurityRisk.HIGH, ActionSecurityRisk.UNKNOWN]:
action.confirmation_state = ActionConfirmationStatus.AWAITING_CONFIRMATION
await self.set_agent_state_to(AgentState.AWAITING_USER_CONFIRMATION)
# User must explicitly approve before executionLayer 4: Runtime-Level Guardrails
- Docker isolation: Agent code runs in a separate container
- Port isolation: Execution (30000-39999), VSCode (40000-49999), App (50000-59999)
- Timeout: Default 120s per action, configurable
- Network: Configurable host network mode (default: isolated)
Layer 5: Budget & Iteration Limits
File: openhands/controller/state/control_flags.py
IterationControlFlag:
max_value: int # default from OH_MAX_ITERATIONS env var
# Raises after max iterations reached
BudgetControlFlag:
max_value: float # from config.max_budget_per_task
# Raises after accumulated LLM cost exceeds budgetLayer 6: Stuck Detection
File: openhands/controller/stuck.py
Five loop detection heuristics prevent infinite loops:
| Pattern | Threshold | Action |
|---|---|---|
| Same action-observation pair repeated | 4x | Error/Recovery |
| Same action causing same error | 3x | Error/Recovery |
| Agent monologue (talking to itself) | Variable | Error/Recovery |
| 6-step repeating pattern | 6 events | Error/Recovery |
| Repeated context window errors | 10 events | Error/Recovery |
Layer 7: Input Validation
File: openhands/agenthub/readonly_agent/function_calling.py:51-103
# Shell injection prevention for search operations
pattern_escaped = shlex.quote(pattern)
path_escaped = shlex.quote(path)
include_escaped = shlex.quote(include_glob)
# Constructs safe ripgrep command
cmd = f"rg {pattern_escaped} {path_escaped} --glob {include_escaped}"File: openhands/agenthub/browsing_agent/utils.py:15-29
# Safe YAML parsing
parsed = yaml.safe_load(content) # Prevents arbitrary code execution5. Token Management & Cost Optimization
Prompt Caching
File: openhands/llm/llm.py:597-612
def is_caching_prompt_active(self):
return self.config.caching_prompt and self._supports_caching()
# Applied in CodeActAgent.step():
if self.llm.is_caching_prompt_active():
messages[-1].cache_enabled = True
# Adds cache_control: {'type': 'ephemeral'} to contentSupported models (from model_features.py:132-149):
Claude 3.x, 3.5.x, 3.7.x
Claude Sonnet 4.x, Opus 4.x
Gemini 3.1-pro
Kimi K2.5
GLM 4/5
Cache metrics tracking (metrics.py):
# Tracks cache effectiveness
cache_read_tokens # Tokens served from cache (cost savings)
cache_write_tokens # Tokens written to cache (one-time cost)Memory Condensation
When context grows too large, the condenser compresses old events:
File: openhands/memory/condenser/impl/llm_summarizing_condenser.py
Condensation flow:
1. Check: len(view) > max_size OR unhandled condensation request
2. Keep first N events (keep_first) as anchor
3. Target compression: max_size / 2
4. Identify events to forget (not in head or tail)
5. Call LLM to summarize forgotten events
6. Replace forgotten events with summary
7. Critical: PRESERVE task tracker IDs and statuses
Summarization prompt template:
USER_CONTEXT: (requirements, goals)
TASK_TRACKING: {Active tasks, IDs, statuses - MUST PRESERVE}
COMPLETED: (done tasks with results)
PENDING: (remaining tasks)
CURRENT_STATE: (variables, state)
CODE_STATE: {files, functions, structures}
TESTS: {failing cases, errors}
VERSION_CONTROL_STATUS: {branch, PR, commits}
Observation Masking
File: openhands/memory/condenser/impl/observation_masking_condenser.py
Old observations outside the attention window are replaced with <MASKED>, preserving the event structure but reducing token count.
Token Counting
File: openhands/llm/llm.py:699-747
def get_token_count(self, messages):
# Uses litellm.token_counter() with optional custom tokenizer
# Falls back to 0 on errors (graceful degradation)
return litellm.token_counter(
model=self.model_name,
messages=messages,
custom_tokenizer=self.config.custom_tokenizer,
)6. Retry & Resilience Patterns
Exponential Backoff
File: openhands/llm/retry_mixin.py:24-76
@tenacity.retry(
wait=wait_exponential(
multiplier=config.retry_multiplier, # 8x
min=config.retry_min_wait, # 8s
max=config.retry_max_wait, # 64s
),
stop=stop_after_attempt(config.num_retries), # 5
retry=retry_if_exception_type(LLM_RETRY_EXCEPTIONS),
reraise=True,
)Retry-eligible exceptions:
LLM_RETRY_EXCEPTIONS = (
APIConnectionError,
RateLimitError,
ServiceUnavailableError,
BadGatewayError,
litellm.Timeout,
litellm.InternalServerError,
LLMNoResponseError,
)Temperature Perturbation
File: openhands/llm/retry_mixin.py:46-60
# When LLM returns empty response with temperature=0:
if isinstance(exception, LLMNoResponseError):
if completion_fn.keywords.get('temperature', None) == 0:
completion_fn.keywords['temperature'] = 1.0
# "Adding randomness to avoid repeated empty responses"This is a clever trick: deterministic temperature (0) can cause the model to consistently return empty responses. Temporarily increasing temperature breaks out of this local minimum.
Context Window Recovery
File: openhands/controller/agent_controller.py:345-355
# When context window is exceeded:
if isinstance(exception, ContextWindowExceededError):
if condensation_enabled:
# Trigger condensation to compress context
trigger_condensation_request()
else:
raise LLMContextWindowExceedError()7. Response Processing Pipeline
LLM Response → Actions
File: openhands/agenthub/codeact_agent/function_calling.py:79-344
def response_to_actions(response, mcp_tool_names):
actions = []
for tool_call in response.choices[0].message.tool_calls:
name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
if name == 'execute_bash':
action = CmdRunAction(command=args['command'])
elif name == 'str_replace_editor':
if args['command'] == 'view':
action = FileReadAction(path=args['path'])
else:
action = FileEditAction(...)
elif name == 'browser':
action = BrowseInteractiveAction(browser_actions=args['code'])
elif name == 'finish':
action = AgentFinishAction(message=args['message'])
elif name == 'think':
action = AgentThinkAction(thought=args['thought'])
elif name in mcp_tool_names:
action = MCPAction(name=name, arguments=args)
# ... etc
set_security_risk(action, args) # Extract LLM's risk assessment
action.tool_call_metadata = ToolCallMetadata(
tool_call_id=tool_call.id,
function_name=name,
model_response=response,
)
actions.append(action)
return actionsMulti-Tool Response Handling
When the LLM returns multiple tool calls in one response:
# In CodeActAgent.step():
actions = response_to_actions(response, self.mcp_tool_names)
self.pending_actions.extend(actions[1:]) # Queue all but first
return actions[0] # Return first immediately
# On next step() call:
if self.pending_actions:
return self.pending_actions.popleft() # Return queued without LLM callThis amortizes LLM latency: one LLM call can produce multiple sequential actions.
8. Metrics & Observability
File: openhands/llm/metrics.py
Every LLM call tracks:
class TokenUsage:
prompt_tokens: int # Input tokens
completion_tokens: int # Output tokens
cache_read_tokens: int # Cache hits (cost savings)
cache_write_tokens: int # Cache writes (one-time cost)
context_window: int # Model's max context
per_turn_token: int # Total tokens this turn
response_id: str # Unique response ID
class ResponseLatency:
latency: float # Round-trip time (seconds)
response_id: str # Links to token usage
model: str # Model used
class Cost:
cost: float # Dollar cost
timestamp: str # When incurredCost calculation (llm.py:764-822):
- Check response headers for cost (some providers include it)
- Fall back to
litellm.completion_cost() - Fall back to custom
input_cost_per_token/output_cost_per_tokenconfig - If all fail, log warning and disable future cost tracking
9. Vision & Multimodal Support
File: openhands/llm/llm.py:561-594
def vision_is_active(self):
return not self.config.disable_vision and self._supports_vision()
def _supports_vision(self):
# Check litellm's supports_vision() on model name
# Check model_info lookup
# Environment override: OPENHANDS_FORCE_VISION=1When vision is active:
ConversationMemoryincludes image URLs in messages- Browser screenshots are sent as image content
- Message serialization uses list format (supports
ImageContent)
10. MCP (Model Context Protocol) Integration
File: openhands/mcp/client.py:20-178
MCP extends the agent's tool set with external servers:
class MCPClient:
async def connect_http(self, url, headers=None):
# Connect to HTTP/SSE MCP server
# Discover available tools
# Add to agent's tool list
async def connect_stdio(self, command, args, env=None):
# Launch MCP server as subprocess
# Connect via stdio transport
# Discover and register tools
async def call_tool(self, name, arguments):
# Execute MCP tool with timeout
# Return CallToolResultDefault MCP servers (from config):
- Tavily search (if API key configured) -- web search capability
- OpenHands MCP at
/mcp/mcp-- internal tools
MCP actions flow through the same event system:
MCPAction → EventStream → Runtime → MCP Proxy → MCP Server → MCPObservation