05 — LLM Integration
LiteLLM Provider Abstraction
OpenHands uses LiteLLM as a unified interface to 100+ LLM providers. This means any model accessible via OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, or other compatible APIs can be used without changing agent code.
Agent code
│
▼
LLM class (openhands/llm/llm.py)
│
▼
LiteLLM (litellm.completion)
│
├── OpenAI API
├── Anthropic API
├── Google Gemini
├── Azure OpenAI
├── AWS Bedrock
├── Ollama (local)
├── OpenRouter
├── LiteLLM Proxy (openhands/ prefix)
└── Any OpenAI-compatible endpoint
LLM Class
File: openhands/llm/llm.py:64-868
The LLM class is the core wrapper around LiteLLM. Key aspects:
Initialization (line 71)
class LLM(RetryMixin, DebugMixin):
def __init__(self, config: LLMConfig, service_id: str, ...):
self.config = copy.deepcopy(config)
self.metrics = metrics or Metrics(model_name=config.model)
self.init_model_info() # Discover model capabilities
self._completion = partial(
litellm_completion,
model=config.model,
api_key=config.api_key,
base_url=config.base_url,
temperature=config.temperature,
max_completion_tokens=config.max_output_tokens,
...
)Completion Flow
The completion property (line 432) returns a decorated wrapper that:
- Formats messages: Converts
Messageobjects to dicts for LiteLLM - Handles non-native function calling: If the model doesn't support native
tool calling, converts to XML-based prompting via
fn_call_converter - Calls LiteLLM:
litellm_completion()with all configured parameters - Converts response back: If using emulated function calling, parses the XML response back into function call format
- Tracks metrics: Cost, tokens, latency
- Logs completions: Optionally saves full request/response to disk
Retry Logic
Via RetryMixin, the completion wrapper retries on transient failures:
LLM_RETRY_EXCEPTIONS = (
APIConnectionError,
RateLimitError,
ServiceUnavailableError,
BadGatewayError,
litellm.Timeout,
litellm.InternalServerError,
LLMNoResponseError,
)Configurable via LLMConfig: num_retries (default 8), retry_min_wait (15s),
retry_max_wait (120s), retry_multiplier (exponential backoff).
Model Info & Capabilities
init_model_info() (line 439) queries LiteLLM for model metadata:
max_input_tokens/max_output_tokens— context window limits- Vision support — auto-detected via
litellm.supports_vision() - Function calling support — via
model_features.py
Function Calling: Native vs Emulated
File: openhands/llm/fn_call_converter.py
Not all LLM providers support OpenAI-style function calling natively. The
fn_call_converter module provides transparent conversion:
Native Path (models with tool calling support)
Agent tools → OpenAI tool format → LLM API → tool_calls in response
Emulated Path (models without native support)
Agent tools → XML prompt injection → LLM text response → XML parsing → tool_calls
The emulated format uses XML tags:
<function=execute_bash>
<parameter=command>ls -la</parameter>
</function>The system prompt suffix (SYSTEM_PROMPT_SUFFIX_TEMPLATE, line 37) documents the
format for the LLM. Stop words (STOP_WORDS) tell the model when to stop
generating after a function call.
Detection
Whether to use native or emulated function calling is determined by:
LLMConfig.native_tool_calling— explicit override (if set)model_features.get_features(model)— auto-detection based on model name (openhands/llm/model_features.py)
Model Features Detection
File: openhands/llm/model_features.py
The get_features() function returns a ModelFeatures object for any model:
class ModelFeatures:
supports_function_calling: bool
supports_stop_words: bool
supports_prompt_cache: bool
supports_reasoning_effort: boolThis is used to automatically configure LLM behavior — e.g., whether to send
stop words, whether to use native tool calling, whether prompt caching
breakpoints are needed (Anthropic).
LLM Registry & Router
File: openhands/llm/llm_registry.py:30-60
The LLMRegistry is a factory that creates and caches LLM instances:
class LLMRegistry:
def __init__(self, config: OpenHandsConfig, ...):
self.service_to_llm: dict[str, LLM] = {}
self.active_agent_llm = self.get_llm('agent', llm_config)
def get_llm(self, service_id: str, config: LLMConfig) -> LLM:
"""Get or create an LLM instance for a service."""
...The registry supports routing — the get_router() method can return an LLM
that routes requests to different models based on configuration, enabling
multi-model setups (e.g., a fast model for simple tasks, a powerful model for
complex reasoning).
Prompt Construction
Message Building
The ConversationMemory class (openhands/memory/conversation_memory.py)
converts event history into LLM-ready Message objects:
Events (condensed history)
│
▼
ConversationMemory.get_messages()
│
├── System message (from PromptManager)
├── RecallObservation → workspace context injection
├── Actions → assistant messages (with tool calls)
├── Observations → tool results
└── User messages → user role messages
│
▼
list[Message] → LLM.format_messages_for_llm() → list[dict]
Vision Support
When vision_is_active() is true (line 556), messages can include image content
(base64-encoded screenshots from browser actions). The Message serialization
handles this automatically.
Prompt Caching
When is_caching_prompt_active() is true (line 591), cache breakpoints are
inserted into messages for Anthropic models. This reduces token costs by caching
the system prompt and early conversation history.
Guardrails
Budget Limits
OpenHandsConfig.max_budget_per_task— Maximum USD spend per task- The
AgentControllertracks cost viaMetrics.accumulated_costand stops when the budget is exceeded (openhands/controller/agent_controller.py:887)
Iteration Caps
OpenHandsConfig.max_iterations— Maximum number of agent steps- Tracked via
StateTrackercontrol flags; the controller errors when exceeded
Security Analysis
SecurityConfig.security_analyzer— Optional security analyzer (e.g.,"invariant")SecurityConfig.confirmation_mode— Require user approval for risky actions- The controller runs security analysis before executing actions
(
_handle_security_analyzer, line 213) - Actions are classified as
LOW,MEDIUM,HIGH, orUNKNOWNrisk
Stuck Detection
StuckDetector(openhands/controller/stuck.py) analyzes recent history for repetitive patterns- When detected, triggers
AgentStuckInLoopErroror offers recovery options (CLI: restart from before loop, restart with last message, or stop)
Tools
The CodeActAgent exposes the following tools to the LLM, configured via
AgentConfig flags:
| Tool | Config Flag | Description |
|---|---|---|
execute_bash |
enable_cmd |
Run shell commands |
execute_ipython_cell |
enable_jupyter |
Run Python code in IPython |
str_replace_editor |
enable_editor |
View/create/edit files (ACI-based) |
llm_based_edit |
enable_llm_editor |
LLM-powered file editing (alternative) |
browser |
enable_browsing |
Interact with web browser |
think |
enable_think |
Log reasoning without acting |
finish |
enable_finish |
Mark task as complete |
request_condensation |
enable_condensation_request |
Request history condensation |
task_tracker |
enable_plan_mode |
Task management for plan mode |
| MCP tools | enable_mcp |
Dynamically loaded via MCP protocol |
Tools are defined as ChatCompletionToolParam dicts (OpenAI tool format) in
openhands/agenthub/codeact_agent/tools/. The agent's _get_tools() method
(line 116) assembles the tool list based on configuration.
MCP (Model Context Protocol) Tools
MCP tools are dynamically added at runtime via add_mcp_tools_to_agent()
(openhands/mcp/). They enable the agent to interact with external services
(databases, APIs, custom tools) through a standardized protocol. MCP servers can
be configured in the TOML config or provided by repository microagents.
OpenHands Provider
Models prefixed with openhands/ are automatically routed to the OpenHands LLM
proxy (openhands/llm/llm.py:136-142):
if self.config.model.startswith('openhands/'):
model_name = self.config.model.removeprefix('openhands/')
self.config.model = f'litellm_proxy/{model_name}'
self.config.base_url = _get_openhands_llm_base_url()This enables the managed SaaS product to provide LLM access without users needing their own API keys.
Special Model Handling
The LLM.__init__ contains model-specific logic (openhands/llm/llm.py:136-228):
| Model | Special Handling |
|---|---|
gemini-2.5-pro |
Thinking budget configuration for reasoning effort |
claude-opus-4-1 |
Extended thinking explicitly disabled |
claude-opus-4-5, claude-sonnet-4 |
top_p dropped when temperature is set |
azure/* |
Uses max_tokens instead of max_completion_tokens |
huggingface/* |
top_p capped at 0.9 |
openhands/* |
Rewritten to litellm_proxy/ with managed base URL |
mistral/*, gemini/* |
Safety settings passed through if configured |