CodeDocs Vault

05 — LLM Integration

LiteLLM Provider Abstraction

OpenHands uses LiteLLM as a unified interface to 100+ LLM providers. This means any model accessible via OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, or other compatible APIs can be used without changing agent code.

Agent code
    │
    ▼
LLM class (openhands/llm/llm.py)
    │
    ▼
LiteLLM (litellm.completion)
    │
    ├── OpenAI API
    ├── Anthropic API
    ├── Google Gemini
    ├── Azure OpenAI
    ├── AWS Bedrock
    ├── Ollama (local)
    ├── OpenRouter
    ├── LiteLLM Proxy (openhands/ prefix)
    └── Any OpenAI-compatible endpoint

LLM Class

File: openhands/llm/llm.py:64-868

The LLM class is the core wrapper around LiteLLM. Key aspects:

Initialization (line 71)

class LLM(RetryMixin, DebugMixin):
    def __init__(self, config: LLMConfig, service_id: str, ...):
        self.config = copy.deepcopy(config)
        self.metrics = metrics or Metrics(model_name=config.model)
 
        self.init_model_info()    # Discover model capabilities
 
        self._completion = partial(
            litellm_completion,
            model=config.model,
            api_key=config.api_key,
            base_url=config.base_url,
            temperature=config.temperature,
            max_completion_tokens=config.max_output_tokens,
            ...
        )

Completion Flow

The completion property (line 432) returns a decorated wrapper that:

  1. Formats messages: Converts Message objects to dicts for LiteLLM
  2. Handles non-native function calling: If the model doesn't support native tool calling, converts to XML-based prompting via fn_call_converter
  3. Calls LiteLLM: litellm_completion() with all configured parameters
  4. Converts response back: If using emulated function calling, parses the XML response back into function call format
  5. Tracks metrics: Cost, tokens, latency
  6. Logs completions: Optionally saves full request/response to disk

Retry Logic

Via RetryMixin, the completion wrapper retries on transient failures:

LLM_RETRY_EXCEPTIONS = (
    APIConnectionError,
    RateLimitError,
    ServiceUnavailableError,
    BadGatewayError,
    litellm.Timeout,
    litellm.InternalServerError,
    LLMNoResponseError,
)

Configurable via LLMConfig: num_retries (default 8), retry_min_wait (15s), retry_max_wait (120s), retry_multiplier (exponential backoff).

Model Info & Capabilities

init_model_info() (line 439) queries LiteLLM for model metadata:


Function Calling: Native vs Emulated

File: openhands/llm/fn_call_converter.py

Not all LLM providers support OpenAI-style function calling natively. The fn_call_converter module provides transparent conversion:

Native Path (models with tool calling support)

Agent tools → OpenAI tool format → LLM API → tool_calls in response

Emulated Path (models without native support)

Agent tools → XML prompt injection → LLM text response → XML parsing → tool_calls

The emulated format uses XML tags:

<function=execute_bash>
<parameter=command>ls -la</parameter>
</function>

The system prompt suffix (SYSTEM_PROMPT_SUFFIX_TEMPLATE, line 37) documents the format for the LLM. Stop words (STOP_WORDS) tell the model when to stop generating after a function call.

Detection

Whether to use native or emulated function calling is determined by:

  1. LLMConfig.native_tool_calling — explicit override (if set)
  2. model_features.get_features(model) — auto-detection based on model name (openhands/llm/model_features.py)

Model Features Detection

File: openhands/llm/model_features.py

The get_features() function returns a ModelFeatures object for any model:

class ModelFeatures:
    supports_function_calling: bool
    supports_stop_words: bool
    supports_prompt_cache: bool
    supports_reasoning_effort: bool

This is used to automatically configure LLM behavior — e.g., whether to send stop words, whether to use native tool calling, whether prompt caching breakpoints are needed (Anthropic).


LLM Registry & Router

File: openhands/llm/llm_registry.py:30-60

The LLMRegistry is a factory that creates and caches LLM instances:

class LLMRegistry:
    def __init__(self, config: OpenHandsConfig, ...):
        self.service_to_llm: dict[str, LLM] = {}
        self.active_agent_llm = self.get_llm('agent', llm_config)
 
    def get_llm(self, service_id: str, config: LLMConfig) -> LLM:
        """Get or create an LLM instance for a service."""
        ...

The registry supports routing — the get_router() method can return an LLM that routes requests to different models based on configuration, enabling multi-model setups (e.g., a fast model for simple tasks, a powerful model for complex reasoning).


Prompt Construction

Message Building

The ConversationMemory class (openhands/memory/conversation_memory.py) converts event history into LLM-ready Message objects:

Events (condensed history)
    │
    ▼
ConversationMemory.get_messages()
    │
    ├── System message (from PromptManager)
    ├── RecallObservation → workspace context injection
    ├── Actions → assistant messages (with tool calls)
    ├── Observations → tool results
    └── User messages → user role messages
    │
    ▼
list[Message] → LLM.format_messages_for_llm() → list[dict]

Vision Support

When vision_is_active() is true (line 556), messages can include image content (base64-encoded screenshots from browser actions). The Message serialization handles this automatically.

Prompt Caching

When is_caching_prompt_active() is true (line 591), cache breakpoints are inserted into messages for Anthropic models. This reduces token costs by caching the system prompt and early conversation history.


Guardrails

Budget Limits

Iteration Caps

Security Analysis

Stuck Detection


Tools

The CodeActAgent exposes the following tools to the LLM, configured via AgentConfig flags:

Tool Config Flag Description
execute_bash enable_cmd Run shell commands
execute_ipython_cell enable_jupyter Run Python code in IPython
str_replace_editor enable_editor View/create/edit files (ACI-based)
llm_based_edit enable_llm_editor LLM-powered file editing (alternative)
browser enable_browsing Interact with web browser
think enable_think Log reasoning without acting
finish enable_finish Mark task as complete
request_condensation enable_condensation_request Request history condensation
task_tracker enable_plan_mode Task management for plan mode
MCP tools enable_mcp Dynamically loaded via MCP protocol

Tools are defined as ChatCompletionToolParam dicts (OpenAI tool format) in openhands/agenthub/codeact_agent/tools/. The agent's _get_tools() method (line 116) assembles the tool list based on configuration.

MCP (Model Context Protocol) Tools

MCP tools are dynamically added at runtime via add_mcp_tools_to_agent() (openhands/mcp/). They enable the agent to interact with external services (databases, APIs, custom tools) through a standardized protocol. MCP servers can be configured in the TOML config or provided by repository microagents.


OpenHands Provider

Models prefixed with openhands/ are automatically routed to the OpenHands LLM proxy (openhands/llm/llm.py:136-142):

if self.config.model.startswith('openhands/'):
    model_name = self.config.model.removeprefix('openhands/')
    self.config.model = f'litellm_proxy/{model_name}'
    self.config.base_url = _get_openhands_llm_base_url()

This enables the managed SaaS product to provide LLM access without users needing their own API keys.


Special Model Handling

The LLM.__init__ contains model-specific logic (openhands/llm/llm.py:136-228):

Model Special Handling
gemini-2.5-pro Thinking budget configuration for reasoning effort
claude-opus-4-1 Extended thinking explicitly disabled
claude-opus-4-5, claude-sonnet-4 top_p dropped when temperature is set
azure/* Uses max_tokens instead of max_completion_tokens
huggingface/* top_p capped at 0.9
openhands/* Rewritten to litellm_proxy/ with managed base URL
mistral/*, gemini/* Safety settings passed through if configured