3. Architecture
openhands-2
3. Architecture
High-Level Architecture
+------------------+
| Web Browser / |
| CLI Terminal |
+--------+---------+
|
HTTP/WebSocket |
v
+-----------------------------------------------+
| FastAPI Server |
| openhands/server/app.py (V0) |
| openhands/app_server/v1_router.py (V1) |
| |
| +------------------+ +------------------+ |
| | REST API Routes | | Socket.IO Events | |
| | (CRUD, settings) | | (real-time stream)| |
| +------------------+ +------------------+ |
+-------------------+---------------------------+
|
v
+-----------------------------------------------+
| Conversation Manager |
| openhands/server/conversation_manager/ |
| |
| Manages agent sessions, lifecycle, |
| concurrency (max_concurrent_conversations) |
+-------------------+---------------------------+
|
+-------------+-------------+
| |
v v
+-------------------------------+ +------------------+
| Agent Controller | | Event Stream |
| controller/agent_controller | | events/stream |
| | | |
| - Main agent loop | | - Event bus |
| - State management | | - Persistence |
| - Delegation | | - Subscriptions |
| - Stuck detection | | - Replay |
| - Security checks | +------------------+
+------+----------+-------------+ ^
| | |
+--------+ +------+-------+ |
v v | |
+-------------+ +----------+ | +---------+---------+
| Agent | | Memory | | | File Store |
| (CodeAct, | | memory/ | | | storage/ |
| Browsing, | | | | | (local, S3, GCS) |
| ReadOnly) | | Condenser| | +-------------------+
+------+------+ | Micro- | |
| | agents | |
v +----------+ |
+-------------+ |
| LLM | |
| llm/llm.py | |
| | |
| - LiteLLM | |
| - Retry | |
| - Metrics | |
| - FnCall | |
| Converter | |
+-------------+ |
|
+------------------------+
|
v
+-----------------------------------------------+
| Runtime |
| runtime/base.py (abstract) |
| |
| +------------------+ +------------------+ |
| | DockerRuntime | | K8sRuntime | |
| | (primary) | | (enterprise) | |
| +------------------+ +------------------+ |
| +------------------+ +------------------+ |
| | LocalRuntime | | RemoteRuntime | |
| | (development) | | (cloud) | |
| +------------------+ +------------------+ |
| |
| Runs inside sandbox container: |
| +------------------------------------------+ |
| | Action Execution Server (FastAPI) | |
| | runtime/action_execution_server.py | |
| | | |
| | - Bash session (persistent shell) | |
| | - File operations (read/write/edit) | |
| | - IPython kernel | |
| | - Browser (Playwright + BrowserGym) | |
| | - MCP proxy | |
| +------------------------------------------+ |
+------------------------------------------------+
Core Components and Their Responsibilities
1. Agent Controller (openhands/controller/agent_controller.py, 1392 lines)
The central orchestrator. Responsibilities:
- Main loop: Calls
agent.step(state)repeatedly until task completion - State management: Tracks
AgentStatetransitions (LOADING -> RUNNING -> FINISHED/ERROR) - Event routing: Subscribes to
EventStream, routes actions and observations - Delegation: Creates child
AgentControllerinstances for multi-agent workflows - Safety: Stuck detection, iteration/budget limits, security checks
- Error recovery: Exception mapping, rate limit handling, context window management
Key methods:
| Method | Line | Purpose |
|---|---|---|
_step() |
863 | Core execution step -- calls agent, checks guards, publishes action |
on_event() |
454 | Event stream subscriber callback |
_handle_action() |
515 | Routes actions to appropriate handlers |
set_agent_state_to() |
673 | Manages state transitions |
start_delegate() |
735 | Creates child controller for sub-agent |
end_delegate() |
796 | Collects delegate results, resumes parent |
_is_stuck() |
1129 | Delegates to StuckDetector |
2. Agent (openhands/controller/agent.py, 191 lines)
Abstract base class for all agents. Uses a registry pattern for dynamic agent lookup.
# Registration happens via decorator or class attribute
Agent.register("CodeActAgent", CodeActAgent)
# Lookup by name
agent_cls = Agent.get_cls("CodeActAgent")Concrete implementations in openhands/agenthub/:
| Agent | File | Purpose |
|---|---|---|
CodeActAgent |
agenthub/codeact_agent/codeact_agent.py |
Primary agent -- bash, file editing, browsing, Python |
BrowsingAgent |
agenthub/browsing_agent/browsing_agent.py |
Web browsing specialist |
ReadOnlyAgent |
agenthub/readonly_agent/readonly_agent.py |
Read-only file operations |
VisualBrowsingAgent |
agenthub/visualbrowsing_agent/ |
Browser with visual understanding |
DummyAgent |
agenthub/dummy_agent/agent.py |
Testing/demo agent |
3. Event System (openhands/events/)
Event sourcing is the backbone of the architecture. Every interaction is recorded as an immutable event.
Event (base dataclass)
├── Action (agent or user intent)
│ ├── MessageAction -- chat message
│ ├── CmdRunAction -- shell command
│ ├── IPythonRunCellAction -- Python code
│ ├── FileReadAction -- read file
│ ├── FileWriteAction -- write file
│ ├── FileEditAction -- edit file (str_replace)
│ ├── BrowseInteractiveAction -- browser interaction
│ ├── AgentDelegateAction -- delegate to sub-agent
│ ├── AgentFinishAction -- task completed
│ ├── AgentThinkAction -- reasoning (logged, not executed)
│ ├── RecallAction -- retrieve microagent knowledge
│ ├── CondensationAction -- compress conversation history
│ ├── MCPAction -- MCP tool call
│ └── ChangeAgentStateAction -- state transition
│
└── Observation (result of action)
├── CmdOutputObservation -- command output
├── IPythonRunCellObservation -- Python output
├── FileReadObservation -- file contents
├── FileEditObservation -- edit result
├── BrowserOutputObservation -- browser state
├── ErrorObservation -- error details
├── AgentDelegateObservation -- sub-agent result
├── RecallObservation -- microagent knowledge
├── AgentCondensationObservation-- condensation result
└── LoopDetectionObservation -- stuck detection alert
EventStream (events/stream.py, 291 lines):
- Thread-safe event store with auto-incrementing IDs
- Subscriber system with per-subscriber thread pools
- Page-based caching for fast reads
- Secret redaction in serialized events
- Async queue-based processing
Subscriber types (EventStreamSubscriber enum):
AGENT_CONTROLLER, RESOLVER, SERVER, RUNTIME, MEMORY, MAIN, TEST
4. Runtime System (openhands/runtime/)
Provides sandboxed execution environments. The key abstraction:
class Runtime(ABC):
async def connect(self) -> None: ...
def run(self, action: CmdRunAction) -> Observation: ...
def run_ipython(self, action: IPythonRunCellAction) -> Observation: ...
def read(self, action: FileReadAction) -> Observation: ...
def write(self, action: FileWriteAction) -> Observation: ...
def edit(self, action: FileEditAction) -> Observation: ...
def browse(self, action: BrowseURLAction) -> Observation: ...
def browse_interactive(self, action: BrowseInteractiveAction) -> Observation: ...
async def call_tool_mcp(self, action: MCPAction) -> Observation: ...DockerRuntime (primary, runtime/impl/docker/docker_runtime.py):
- Creates a Docker container with the OpenHands sandbox image
- Runs an Action Execution Server (FastAPI) inside the container
- Communicates via HTTP over mapped ports (30000-39999)
- Provides persistent bash session, IPython kernel, Playwright browser
- Port mapping: Execution (30000-39999), VSCode (40000-49999), App (50000-59999)
5. LLM Layer (openhands/llm/)
Wraps LLM providers behind a unified interface with extensive resilience features.
LLM (llm.py, 874 lines)
├── RetryMixin (retry_mixin.py, 108 lines)
│ └── tenacity-based exponential backoff
├── DebugMixin (debug_mixin.py)
│ └── Prompt/response logging
├── Metrics (metrics.py, 284 lines)
│ └── Cost, tokens, latency tracking
├── FnCallConverter (fn_call_converter.py, 979 lines)
│ └── Native ↔ text-based function call conversion
├── ModelFeatures (model_features.py, 173 lines)
│ └── Pattern-based feature detection per model
└── StreamingLLM / AsyncLLM
└── Async and streaming variants
6. Memory System (openhands/memory/)
Memory (memory.py, 405 lines)
├── Microagent loader (global + user + repo)
├── RecallAction handler
│ ├── WORKSPACE_CONTEXT recall (first message)
│ └── KNOWLEDGE recall (trigger-based)
└── RecallObservation emitter
ConversationMemory (conversation_memory.py)
├── Event → Message converter
├── Tool call completion tracking
└── Vision content handling
Condenser (condenser/)
├── LLMSummarizingCondenser -- LLM-generated summaries
├── StructuredSummaryCondenser -- Structured field extraction
├── ObservationMaskingCondenser -- Masks old observations
├── ConversationWindowCondenser -- Sliding window
├── AmortizedForgettingCondenser -- Gradual forgetting
└── NoOpCondenser -- No compression
7. Server & API (openhands/server/ and openhands/app_server/)
Dual architecture during V0→V1 migration:
V0 (Legacy):
- FastAPI app with Socket.IO for real-time events
- ConversationManager manages agent sessions
- Routes: conversations, files, settings, security, git, trajectory
V1 (New):
- REST API-first with SDK-based agent core
- Service layer with dependency injection
- Routes: events, sandboxes, skills, webhooks, users
Component Interaction Diagram
sequenceDiagram
participant User as User (Browser/CLI)
participant Server as FastAPI Server
participant CM as ConversationManager
participant AC as AgentController
participant Agent as CodeActAgent
participant LLM as LLM (LiteLLM)
participant ES as EventStream
participant Mem as Memory
participant RT as Runtime (Docker)
participant Sandbox as Sandbox Container
User->>Server: POST /api/conversations (create)
Server->>CM: attach_to_conversation()
CM->>RT: create & connect runtime
RT->>Sandbox: docker create + start
Sandbox-->>RT: Action Execution Server ready
User->>Server: WebSocket connect (conversation_id)
Server->>CM: join_conversation()
CM-->>User: replay existing events
User->>Server: send MessageAction
Server->>ES: add_event(MessageAction, SOURCE=USER)
ES->>AC: on_event(MessageAction)
AC->>Mem: RecallAction(WORKSPACE_CONTEXT)
Mem-->>ES: RecallObservation (repo info, microagents)
ES->>AC: on_event(RecallObservation)
loop Agent Loop
AC->>Agent: step(state)
Agent->>LLM: completion(messages, tools)
LLM-->>Agent: tool_call (e.g., execute_bash)
Agent-->>AC: CmdRunAction
AC->>AC: security_check(action)
AC->>ES: add_event(CmdRunAction, SOURCE=AGENT)
ES->>RT: on_event(CmdRunAction)
RT->>Sandbox: HTTP POST /execute
Sandbox-->>RT: command output
RT->>ES: add_event(CmdOutputObservation)
ES->>AC: on_event(CmdOutputObservation)
ES->>Server: emit('oh_event')
Server->>User: WebSocket event
end
Agent-->>AC: AgentFinishAction
AC->>ES: add_event(AgentFinishAction)
ES->>Server: emit('oh_event')
Server->>User: task completeData Flow: Action Processing Pipeline
User Message
│
v
EventStream.add_event(MessageAction)
│
├── Serialize to JSON
├── Redact secrets
├── Assign ID + timestamp
├── Persist to FileStore
└── Queue for async dispatch
│
v
AgentController.on_event()
│
├── Add to state.history
├── _handle_message_action()
│ └── Create RecallAction → Memory
│ └── RecallObservation (workspace context)
└── should_step() → True
│
v
_step()
│
├── Check: state == RUNNING?
├── Check: no pending_action?
├── Check: iteration/budget limits?
├── Check: not stuck?
│
├── agent.step(state) → Action
│ │
│ ├── ConversationMemory.process_events()
│ │ └── Convert events → LLM messages
│ │
│ ├── Condenser.condensed_history()
│ │ └── Maybe compress old events
│ │
│ ├── LLM.completion(messages, tools)
│ │ ├── Format messages (cache, vision)
│ │ ├── Mock function calling if needed
│ │ ├── Call litellm.completion()
│ │ ├── Track metrics (cost, tokens, latency)
│ │ └── Convert response → Actions
│ │
│ └── Return Action (or queue multiple)
│
├── Security analysis (risk assessment)
├── Confirmation mode check
└── EventStream.add_event(action)
│
└── Runtime subscriber receives action
│
v
Execute in sandbox
│
v
EventStream.add_event(observation)
│
└── Back to AgentController.on_event()
Multi-Agent Delegation Flow
AgentController (Parent)
│
├── Agent produces AgentDelegateAction
│ └── {agent: "BrowsingAgent", inputs: {task: "..."}}
│
├── start_delegate()
│ ├── Create new Agent instance (BrowsingAgent)
│ ├── Create child AgentController
│ │ ├── delegate_level = parent + 1
│ │ ├── Shared metrics (cost tracking)
│ │ ├── Shared event_stream
│ │ ├── is_delegate = True (no subscription)
│ │ └── start_id = current stream position
│ └── Send MessageAction to child
│
├── Parent pauses (should_step returns False when delegate active)
│
├── Child AgentController runs:
│ ├── Child agent.step() → actions
│ ├── Actions executed via runtime
│ ├── Observations received
│ └── Eventually: AgentFinishAction
│
└── end_delegate()
├── Close child controller
├── Extract outputs from child state
├── Create AgentDelegateObservation
├── Publish to event stream
└── Parent resumes normal loop
State Machine
┌──────────┐
│ LOADING │
└────┬─────┘
│
┌────v─────┐
┌─────┤ RUNNING ├─────┐
│ └────┬─────┘ │
│ │ │
┌─────────v───┐ ┌────v─────┐ ┌──v──────────────────┐
│ PAUSED │ │ AWAITING │ │ AWAITING │
│ (user pause)│ │ USER │ │ USER_CONFIRMATION │
└─────────┬───┘ │ INPUT │ │ (security check) │
│ └────┬─────┘ └──┬──────────┬────────┘
│ │ │ │
└──────────┼──────────┘ ┌──────v──────┐
│ │ USER_ │
│ │ CONFIRMED/ │
┌────v─────┐ │ REJECTED │
│ RUNNING │◄───────┘ │
└──┬───┬───┘ │
│ │ │
┌────────┘ └────────┐ │
v v │
┌──────────┐ ┌──────────┐ │
│ FINISHED │ │ ERROR │ │
└──────────┘ └──────────┘ │
^ │
│ │
┌─────┴──────┐ │
│RATE_LIMITED│ │
└────────────┘ │
│
┌────────────┐ │
│ STOPPED │◄──────────┘
│ (user stop)│ (if rejected)
└────────────┘