05 — LLM Integration

LiteLLM Provider Abstraction

OpenHands uses LiteLLM as a unified interface to 100+ LLM providers. This means any model accessible via OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, or other compatible APIs can be used without changing agent code.

Agent code
    │
    ▼
LLM class (openhands/llm/llm.py)
    │
    ▼
LiteLLM (litellm.completion)
    │
    ├── OpenAI API
    ├── Anthropic API
    ├── Google Gemini
    ├── Azure OpenAI
    ├── AWS Bedrock
    ├── Ollama (local)
    ├── OpenRouter
    ├── LiteLLM Proxy (openhands/ prefix)
    └── Any OpenAI-compatible endpoint

LLM Class

File: openhands/llm/llm.py:64-868

The LLM class is the core wrapper around LiteLLM. Key aspects:

Initialization (line 71)

class LLM(RetryMixin, DebugMixin):
    def __init__(self, config: LLMConfig, service_id: str, ...):
        self.config = copy.deepcopy(config)
        self.metrics = metrics or Metrics(model_name=config.model)
 
        self.init_model_info()    # Discover model capabilities
 
        self._completion = partial(
            litellm_completion,
            model=config.model,
            api_key=config.api_key,
            base_url=config.base_url,
            temperature=config.temperature,
            max_completion_tokens=config.max_output_tokens,
            ...
        )

Completion Flow

The completion property (line 432) returns a decorated wrapper that:

Formats messages: Converts Message objects to dicts for LiteLLM
Handles non-native function calling: If the model doesn't support native tool calling, converts to XML-based prompting via fn_call_converter
Calls LiteLLM: litellm_completion() with all configured parameters
Converts response back: If using emulated function calling, parses the XML response back into function call format
Tracks metrics: Cost, tokens, latency
Logs completions: Optionally saves full request/response to disk

Retry Logic

Via RetryMixin, the completion wrapper retries on transient failures:

LLM_RETRY_EXCEPTIONS = (
    APIConnectionError,
    RateLimitError,
    ServiceUnavailableError,
    BadGatewayError,
    litellm.Timeout,
    litellm.InternalServerError,
    LLMNoResponseError,
)

Configurable via LLMConfig: num_retries (default 8), retry_min_wait (15s), retry_max_wait (120s), retry_multiplier (exponential backoff).

Model Info & Capabilities

init_model_info() (line 439) queries LiteLLM for model metadata:

max_input_tokens / max_output_tokens — context window limits
Vision support — auto-detected via litellm.supports_vision()
Function calling support — via model_features.py

Function Calling: Native vs Emulated

File: openhands/llm/fn_call_converter.py

Not all LLM providers support OpenAI-style function calling natively. The fn_call_converter module provides transparent conversion:

Native Path (models with tool calling support)

Agent tools → OpenAI tool format → LLM API → tool_calls in response

Emulated Path (models without native support)

Agent tools → XML prompt injection → LLM text response → XML parsing → tool_calls

The emulated format uses XML tags:

<function=execute_bash>
<parameter=command>ls -la</parameter>
</function>

The system prompt suffix (SYSTEM_PROMPT_SUFFIX_TEMPLATE, line 37) documents the format for the LLM. Stop words (STOP_WORDS) tell the model when to stop generating after a function call.

Detection

Whether to use native or emulated function calling is determined by:

LLMConfig.native_tool_calling — explicit override (if set)
model_features.get_features(model) — auto-detection based on model name (openhands/llm/model_features.py)

Model Features Detection

File: openhands/llm/model_features.py

The get_features() function returns a ModelFeatures object for any model:

class ModelFeatures:
    supports_function_calling: bool
    supports_stop_words: bool
    supports_prompt_cache: bool
    supports_reasoning_effort: bool

This is used to automatically configure LLM behavior — e.g., whether to send stop words, whether to use native tool calling, whether prompt caching breakpoints are needed (Anthropic).

LLM Registry & Router

File: openhands/llm/llm_registry.py:30-60

The LLMRegistry is a factory that creates and caches LLM instances:

class LLMRegistry:
    def __init__(self, config: OpenHandsConfig, ...):
        self.service_to_llm: dict[str, LLM] = {}
        self.active_agent_llm = self.get_llm('agent', llm_config)
 
    def get_llm(self, service_id: str, config: LLMConfig) -> LLM:
        """Get or create an LLM instance for a service."""
        ...

The registry supports routing — the get_router() method can return an LLM that routes requests to different models based on configuration, enabling multi-model setups (e.g., a fast model for simple tasks, a powerful model for complex reasoning).

Prompt Construction

Message Building

The ConversationMemory class (openhands/memory/conversation_memory.py) converts event history into LLM-ready Message objects:

Events (condensed history)
    │
    ▼
ConversationMemory.get_messages()
    │
    ├── System message (from PromptManager)
    ├── RecallObservation → workspace context injection
    ├── Actions → assistant messages (with tool calls)
    ├── Observations → tool results
    └── User messages → user role messages
    │
    ▼
list[Message] → LLM.format_messages_for_llm() → list[dict]

Vision Support

When vision_is_active() is true (line 556), messages can include image content (base64-encoded screenshots from browser actions). The Message serialization handles this automatically.

Prompt Caching

When is_caching_prompt_active() is true (line 591), cache breakpoints are inserted into messages for Anthropic models. This reduces token costs by caching the system prompt and early conversation history.

Guardrails

Budget Limits

OpenHandsConfig.max_budget_per_task — Maximum USD spend per task
The AgentController tracks cost via Metrics.accumulated_cost and stops when the budget is exceeded (openhands/controller/agent_controller.py:887)

Iteration Caps

OpenHandsConfig.max_iterations — Maximum number of agent steps
Tracked via StateTracker control flags; the controller errors when exceeded

Security Analysis

SecurityConfig.security_analyzer — Optional security analyzer (e.g., "invariant")
SecurityConfig.confirmation_mode — Require user approval for risky actions
The controller runs security analysis before executing actions (_handle_security_analyzer, line 213)
Actions are classified as LOW, MEDIUM, HIGH, or UNKNOWN risk

Stuck Detection

StuckDetector (openhands/controller/stuck.py) analyzes recent history for repetitive patterns
When detected, triggers AgentStuckInLoopError or offers recovery options (CLI: restart from before loop, restart with last message, or stop)

Tools

The CodeActAgent exposes the following tools to the LLM, configured via AgentConfig flags:

Tool	Config Flag	Description
`execute_bash`	`enable_cmd`	Run shell commands
`execute_ipython_cell`	`enable_jupyter`	Run Python code in IPython
`str_replace_editor`	`enable_editor`	View/create/edit files (ACI-based)
`llm_based_edit`	`enable_llm_editor`	LLM-powered file editing (alternative)
`browser`	`enable_browsing`	Interact with web browser
`think`	`enable_think`	Log reasoning without acting
`finish`	`enable_finish`	Mark task as complete
`request_condensation`	`enable_condensation_request`	Request history condensation
`task_tracker`	`enable_plan_mode`	Task management for plan mode
MCP tools	`enable_mcp`	Dynamically loaded via MCP protocol

Tools are defined as ChatCompletionToolParam dicts (OpenAI tool format) in openhands/agenthub/codeact_agent/tools/. The agent's _get_tools() method (line 116) assembles the tool list based on configuration.

MCP (Model Context Protocol) Tools

MCP tools are dynamically added at runtime via add_mcp_tools_to_agent() (openhands/mcp/). They enable the agent to interact with external services (databases, APIs, custom tools) through a standardized protocol. MCP servers can be configured in the TOML config or provided by repository microagents.

OpenHands Provider

Models prefixed with openhands/ are automatically routed to the OpenHands LLM proxy (openhands/llm/llm.py:136-142):

if self.config.model.startswith('openhands/'):
    model_name = self.config.model.removeprefix('openhands/')
    self.config.model = f'litellm_proxy/{model_name}'
    self.config.base_url = _get_openhands_llm_base_url()

This enables the managed SaaS product to provide LLM access without users needing their own API keys.

Special Model Handling

The LLM.__init__ contains model-specific logic (openhands/llm/llm.py:136-228):

Model	Special Handling
`gemini-2.5-pro`	Thinking budget configuration for reasoning effort
`claude-opus-4-1`	Extended thinking explicitly disabled
`claude-opus-4-5`, `claude-sonnet-4`	`top_p` dropped when `temperature` is set
`azure/*`	Uses `max_tokens` instead of `max_completion_tokens`
`huggingface/*`	`top_p` capped at 0.9
`openhands/*`	Rewritten to `litellm_proxy/` with managed base URL
`mistral/`, `gemini/`	Safety settings passed through if configured