5. Design Patterns & Decisions
Patterns Used
1. Event Sourcing
Where: openhands/events/stream.py, openhands/events/event.py
Every agent action and runtime observation is recorded as an immutable event with:
- Auto-incrementing ID
- ISO timestamp
- Source (AGENT, USER, ENVIRONMENT)
- Causal link (
causefield linking observation to its triggering action)
Benefits realized:
- Complete audit trail for debugging agent behavior
- Session replay (reconnecting clients get full history)
- Natural fit for WebSocket streaming to frontend
- Enables condensation (summarize old events while preserving the record)
Implementation detail: Events are serialized to JSON, secrets are redacted via _replace_secrets(), and stored in a configurable FileStore (local filesystem, S3, or Google Cloud Storage).
2. Observer/Subscriber Pattern
Where: openhands/events/stream.py:130-161
EventStream manages multiple subscribers via enum-based registration:
class EventStreamSubscriber(str, Enum):
AGENT_CONTROLLER = 'agent_controller'
RUNTIME = 'runtime'
SERVER = 'server'
MEMORY = 'memory'
...Each subscriber gets its own ThreadPoolExecutor, isolating failure domains. Events are dispatched asynchronously through a queue.
Notable: The subscription model uses dict[EventStreamSubscriber, dict[str, Callable]] -- each subscriber type can have multiple callback IDs, allowing fine-grained registration.
3. Strategy Pattern
Where: Multiple interchangeable implementations behind abstract interfaces
| Interface | Implementations | File |
|---|---|---|
Runtime |
Docker, K8s, Local, Remote, CLI | runtime/base.py |
FileStore |
Local, S3, GoogleCloud, InMemory | storage/files.py |
SecurityAnalyzer |
LLM-based, Invariant, GraySwan | security/analyzer.py |
Condenser |
LLMSummarizing, Structured, ObservationMasking, Window, NoOp | memory/condenser/ |
ConversationManager |
Standalone, DockerNested | server/conversation_manager/ |
SandboxService |
Docker, Process, Remote | app_server/sandbox/ |
4. Registry Pattern
Where: openhands/controller/agent.py:128-169
Agents register themselves by name, enabling dynamic lookup:
Agent.register("CodeActAgent", CodeActAgent)
agent_cls = Agent.get_cls("CodeActAgent")This decouples agent creation from the controller -- the controller only needs the agent name (from config) to instantiate the right class.
5. Template Method Pattern
Where: openhands/controller/agent.py (abstract) → openhands/agenthub/codeact_agent/codeact_agent.py (concrete)
The base Agent class defines the skeleton:
get_system_message()-- can be overridden but has default behaviorstep(state) -> Action-- abstract, must be implementedreset()-- can be overridden for custom cleanup
Subclasses fill in the specific behavior (how to call LLM, which tools to use, how to convert responses).
6. Decorator Pattern (Mixins)
Where: openhands/llm/llm.py:70
class LLM(RetryMixin, DebugMixin):The LLM class composes behavior from mixins:
RetryMixinadds tenacity-based retry with exponential backoffDebugMixinadds prompt/response logging
This avoids deep inheritance hierarchies while keeping concerns separated.
7. Chain of Responsibility
Where: Agent action processing pipeline
Agent.step() → Action
→ AgentController._step() -- iteration/budget check
→ SecurityAnalyzer.security_risk() -- risk assessment
→ Confirmation mode check -- user approval
→ EventStream.add_event() -- dispatch
→ Runtime.on_event() -- execution
Each handler can short-circuit the chain (e.g., security check can block execution, budget check can stop the loop).
8. Adapter Pattern (Function Call Conversion)
Where: openhands/llm/fn_call_converter.py (979 lines)
This is one of the most sophisticated patterns in the codebase. It adapts models without native function calling to work with OpenHands' tool-based architecture:
Forward conversion (native → text):
{"tool_calls": [{"function": {"name": "execute_bash", "arguments": ...}}]}
→ "<function=execute_bash><parameter=command>ls -la</parameter></function>"
Reverse conversion (text → native):
"<function=execute_bash><parameter=command>ls -la</parameter></function>"
→ {"tool_calls": [{"function": {"name": "execute_bash", "arguments": ...}}]}
This adapter includes:
- XML-like format definition injected into system prompt
- In-context learning examples generated dynamically based on available tools
- Stop words (
</function) to prevent incomplete function calls - Regex-based parsing with error recovery (
_fix_stopword(),_normalize_parameter_tags()) - Parameter type validation and enum checking
9. Delegation Pattern (Multi-Agent)
Where: openhands/controller/agent_controller.py:735-861
The parent controller creates a child controller when the agent produces AgentDelegateAction:
def start_delegate(self, action: AgentDelegateAction):
# Create child agent from registry
delegate_agent = Agent.get_cls(action.agent)(self.llm, self.config)
# Create child controller with:
# - Shared event stream (not subscribed independently)
# - Shared metrics (cost tracking aggregates)
# - Increased delegate_level
# - is_delegate=True (parent routes events to child)
self.delegate = AgentController(
agent=delegate_agent,
event_stream=self.event_stream,
delegate_level=self.state.delegate_level + 1,
is_delegate=True,
...
)The parent stops stepping while the delegate is active (should_step() returns False). When the delegate finishes, end_delegate() collects outputs and creates AgentDelegateObservation.
Notable Design Tradeoffs
1. Event Stream as Single Bus vs. Separate Channels
Decision: Single EventStream for all events (agent actions, runtime observations, state changes, memory operations).
Tradeoff:
- (+) Simplicity: One place to subscribe, one persistence mechanism
- (+) Natural ordering: Events have global sequence numbers
- (+) Easy replay: Reconnecting clients get everything in order
- (-) Coupling: All subscribers see all events, need to filter
- (-) Potential bottleneck: High-frequency events compete for the single queue
- (-) Secret management complexity: Must redact secrets in all events since all subscribers see everything
2. Synchronous Agent Step in Async Server
Decision: agent.step() is synchronous, but the server is async (FastAPI + Socket.IO).
Tradeoff:
- (+) Simpler agent implementations (no async/await complexity)
- (+) LLM calls are naturally blocking (wait for full response)
- (-) Requires thread pool executors for non-blocking server
- (-) Bridging async/sync adds complexity (
asyncio.get_event_loop().run_in_executor())
3. Docker-in-Docker for Sandboxing
Decision: The main OpenHands server runs in Docker, and creates Docker containers for agent sandboxes by mounting the Docker socket.
Tradeoff:
- (+) Strong isolation: Agent code runs in separate container
- (+) Reproducible environments: Consistent sandbox images
- (+) Port isolation: Separate port ranges for each service
- (-) Requires Docker socket access (security concern)
- (-) Resource overhead of nested containers
- (-) Docker dependency for development (though local runtime exists)
4. LiteLLM as Universal Provider Layer
Decision: Use LiteLLM for all LLM calls rather than direct provider SDKs.
Tradeoff:
- (+) Single interface for 100+ providers
- (+) Provider switching without code changes
- (+) Built-in fallback and load balancing
- (-) Additional abstraction layer (harder to debug provider-specific issues)
- (-) Version pinning challenges (litellm moves fast)
- (-) Must work around LiteLLM bugs (explicit version pin
>=1.74.3"fixes known bugs") - (-) Custom function call conversion needed anyway for non-native models
5. Text-Based Function Calling as Fallback
Decision: For models without native function calling, inject an XML-like format (<function=name><parameter=key>value</parameter></function>) into the system prompt.
Tradeoff:
- (+) Enables tool use on any text-generation model
- (+) In-context learning examples teach the format reliably
- (-) Consumes prompt tokens for format description + examples
- (-) Regex parsing is fragile (requires
_fix_stopword(),_normalize_parameter_tags()) - (-) One function call per message limitation (reduces efficiency)
6. Prompt Caching via Content Flags
Decision: Mark specific messages with cache_enabled = True and rely on provider-side caching (Anthropic's ephemeral cache).
Tradeoff:
- (+) Significant cost reduction for repeated prefixes (system prompt, early conversation)
- (+) No local cache management needed
- (-) Provider-specific (only works with Anthropic and select models)
- (-) Cache hit/miss tracking adds complexity to metrics
- (-) Summarization condenser disables caching (
caching_prompt=False) because summaries are write-once
What's Unusual or Clever
1. Self-Assessed Security Risk
Every tool that can modify state has a security_risk parameter that the LLM must fill in (LOW/MEDIUM/HIGH). The system prompt defines what each level means. This is clever because:
- The LLM self-assesses risk before each action
- The controller can override or block based on the assessment
- It creates an audit trail of risk decisions
- It shifts the burden from heuristic rules to contextual judgment
File: openhands/agenthub/codeact_agent/prompts/security_risk_assessment.j2
The risk definitions change based on context (CLI mode vs. sandbox mode), recognizing that the same action has different risk profiles depending on the execution environment.
2. Temperature Perturbation on Empty Responses
File: openhands/llm/retry_mixin.py:46-60
When the LLM returns no response and temperature is 0, the retry logic temporarily sets temperature to 1.0. This breaks out of deterministic empty-response loops without permanently changing the model's behavior.
3. Dynamic In-Context Learning Examples
File: openhands/llm/fn_call_converter.py:326-392
For models without native function calling, the system generates tool-usage examples dynamically based on which tools are actually enabled. If the agent doesn't have browser access, the browser example is omitted. This prevents the model from trying to use unavailable tools.
4. Condensation with Task Tracking Preservation
File: openhands/memory/condenser/impl/llm_summarizing_condenser.py
When the condenser summarizes old events, it explicitly instructs the summarization LLM to preserve task tracker IDs and statuses. This ensures that task state survives memory compression -- a subtle but important detail for long-running workflows.
The summarization prompt uses a structured template:
USER_CONTEXT: ...
TASK_TRACKING: {task IDs, statuses} ← MUST be preserved
COMPLETED: ...
PENDING: ...
CODE_STATE: {files, functions, structures}
TESTS: {failing cases, error messages}
VERSION_CONTROL_STATUS: {branch, PR, commits}
5. Pending Action Queue in CodeActAgent
File: openhands/agenthub/codeact_agent/codeact_agent.py:170-175
When the LLM returns multiple tool calls in a single response, the agent queues them in pending_actions (a deque). On subsequent step() calls, it returns queued actions without calling the LLM again. This amortizes LLM latency across multiple actions.
6. Event-Driven Memory Recall
Rather than loading all microagent knowledge upfront (which would consume tokens), the memory system uses a pull-based approach:
- First user message triggers
RecallAction(WORKSPACE_CONTEXT)-- loads repo info and matching microagents - Subsequent messages trigger
RecallAction(KNOWLEDGE)-- loads only microagents matching keywords in the new message
This lazy loading prevents token waste on irrelevant knowledge.
7. Dual Serialization Strategy for Messages
File: openhands/core/message.py
Messages serialize differently depending on model capabilities:
- List serializer (for models supporting structured content): Returns content as
[{type: "text", text: "..."}]with cache_control and image support - String serializer (for constrained models): Concatenates text content with newlines
The serialization strategy is controlled by per-message flags (cache_enabled, vision_enabled, function_calling_enabled), allowing mixed strategies within a single conversation.
8. Linus Torvalds-Inspired Prompt Variant
File: openhands/agenthub/codeact_agent/prompts/system_prompt_tech_philosophy.j2
One of the system prompt variants embeds Linus Torvalds' engineering philosophy:
- "Good taste" = eliminating special cases rather than handling them
- "Never break userspace" = backward compatibility is sacred
- Pragmatism over theoretical purity
- Obsession with simplicity (max 3 levels of indentation)
This is used via system_prompt_filename in agent config, allowing teams to select the coding philosophy their agent follows. It includes a structured 5-layer problem decomposition framework and decision output formats.