5. Design Patterns & Decisions

Patterns Used

1. Event Sourcing

Where: openhands/events/stream.py, openhands/events/event.py

Every agent action and runtime observation is recorded as an immutable event with:

Auto-incrementing ID
ISO timestamp
Source (AGENT, USER, ENVIRONMENT)
Causal link (cause field linking observation to its triggering action)

Benefits realized:

Complete audit trail for debugging agent behavior
Session replay (reconnecting clients get full history)
Natural fit for WebSocket streaming to frontend
Enables condensation (summarize old events while preserving the record)

Implementation detail: Events are serialized to JSON, secrets are redacted via _replace_secrets(), and stored in a configurable FileStore (local filesystem, S3, or Google Cloud Storage).

2. Observer/Subscriber Pattern

Where: openhands/events/stream.py:130-161

EventStream manages multiple subscribers via enum-based registration:

class EventStreamSubscriber(str, Enum):
    AGENT_CONTROLLER = 'agent_controller'
    RUNTIME = 'runtime'
    SERVER = 'server'
    MEMORY = 'memory'
    ...

Each subscriber gets its own ThreadPoolExecutor, isolating failure domains. Events are dispatched asynchronously through a queue.

Notable: The subscription model uses dict[EventStreamSubscriber, dict[str, Callable]] -- each subscriber type can have multiple callback IDs, allowing fine-grained registration.

3. Strategy Pattern

Where: Multiple interchangeable implementations behind abstract interfaces

Interface	Implementations	File
`Runtime`	Docker, K8s, Local, Remote, CLI	`runtime/base.py`
`FileStore`	Local, S3, GoogleCloud, InMemory	`storage/files.py`
`SecurityAnalyzer`	LLM-based, Invariant, GraySwan	`security/analyzer.py`
`Condenser`	LLMSummarizing, Structured, ObservationMasking, Window, NoOp	`memory/condenser/`
`ConversationManager`	Standalone, DockerNested	`server/conversation_manager/`
`SandboxService`	Docker, Process, Remote	`app_server/sandbox/`

4. Registry Pattern

Where: openhands/controller/agent.py:128-169

Agents register themselves by name, enabling dynamic lookup:

Agent.register("CodeActAgent", CodeActAgent)
agent_cls = Agent.get_cls("CodeActAgent")

This decouples agent creation from the controller -- the controller only needs the agent name (from config) to instantiate the right class.

5. Template Method Pattern

Where: openhands/controller/agent.py (abstract) → openhands/agenthub/codeact_agent/codeact_agent.py (concrete)

The base Agent class defines the skeleton:

get_system_message() -- can be overridden but has default behavior
step(state) -> Action -- abstract, must be implemented
reset() -- can be overridden for custom cleanup

Subclasses fill in the specific behavior (how to call LLM, which tools to use, how to convert responses).

6. Decorator Pattern (Mixins)

Where: openhands/llm/llm.py:70

class LLM(RetryMixin, DebugMixin):

The LLM class composes behavior from mixins:

RetryMixin adds tenacity-based retry with exponential backoff
DebugMixin adds prompt/response logging

This avoids deep inheritance hierarchies while keeping concerns separated.

7. Chain of Responsibility

Where: Agent action processing pipeline

Agent.step() → Action
  → AgentController._step() -- iteration/budget check
    → SecurityAnalyzer.security_risk() -- risk assessment
      → Confirmation mode check -- user approval
        → EventStream.add_event() -- dispatch
          → Runtime.on_event() -- execution

Each handler can short-circuit the chain (e.g., security check can block execution, budget check can stop the loop).

8. Adapter Pattern (Function Call Conversion)

Where: openhands/llm/fn_call_converter.py (979 lines)

This is one of the most sophisticated patterns in the codebase. It adapts models without native function calling to work with OpenHands' tool-based architecture:

Forward conversion (native → text):

{"tool_calls": [{"function": {"name": "execute_bash", "arguments": ...}}]}
  → "<function=execute_bash><parameter=command>ls -la</parameter></function>"

Reverse conversion (text → native):

"<function=execute_bash><parameter=command>ls -la</parameter></function>"
  → {"tool_calls": [{"function": {"name": "execute_bash", "arguments": ...}}]}

This adapter includes:

XML-like format definition injected into system prompt
In-context learning examples generated dynamically based on available tools
Stop words (</function) to prevent incomplete function calls
Regex-based parsing with error recovery (_fix_stopword(), _normalize_parameter_tags())
Parameter type validation and enum checking

9. Delegation Pattern (Multi-Agent)

Where: openhands/controller/agent_controller.py:735-861

The parent controller creates a child controller when the agent produces AgentDelegateAction:

def start_delegate(self, action: AgentDelegateAction):
    # Create child agent from registry
    delegate_agent = Agent.get_cls(action.agent)(self.llm, self.config)
 
    # Create child controller with:
    # - Shared event stream (not subscribed independently)
    # - Shared metrics (cost tracking aggregates)
    # - Increased delegate_level
    # - is_delegate=True (parent routes events to child)
    self.delegate = AgentController(
        agent=delegate_agent,
        event_stream=self.event_stream,
        delegate_level=self.state.delegate_level + 1,
        is_delegate=True,
        ...
    )

The parent stops stepping while the delegate is active (should_step() returns False). When the delegate finishes, end_delegate() collects outputs and creates AgentDelegateObservation.

Notable Design Tradeoffs

1. Event Stream as Single Bus vs. Separate Channels

Decision: Single EventStream for all events (agent actions, runtime observations, state changes, memory operations).

Tradeoff:

(+) Simplicity: One place to subscribe, one persistence mechanism
(+) Natural ordering: Events have global sequence numbers
(+) Easy replay: Reconnecting clients get everything in order
(-) Coupling: All subscribers see all events, need to filter
(-) Potential bottleneck: High-frequency events compete for the single queue
(-) Secret management complexity: Must redact secrets in all events since all subscribers see everything

2. Synchronous Agent Step in Async Server

Decision: agent.step() is synchronous, but the server is async (FastAPI + Socket.IO).

Tradeoff:

(+) Simpler agent implementations (no async/await complexity)
(+) LLM calls are naturally blocking (wait for full response)
(-) Requires thread pool executors for non-blocking server
(-) Bridging async/sync adds complexity (asyncio.get_event_loop().run_in_executor())

3. Docker-in-Docker for Sandboxing

Decision: The main OpenHands server runs in Docker, and creates Docker containers for agent sandboxes by mounting the Docker socket.

Tradeoff:

(+) Strong isolation: Agent code runs in separate container
(+) Reproducible environments: Consistent sandbox images
(+) Port isolation: Separate port ranges for each service
(-) Requires Docker socket access (security concern)
(-) Resource overhead of nested containers
(-) Docker dependency for development (though local runtime exists)

4. LiteLLM as Universal Provider Layer

Decision: Use LiteLLM for all LLM calls rather than direct provider SDKs.

Tradeoff:

(+) Single interface for 100+ providers
(+) Provider switching without code changes
(+) Built-in fallback and load balancing
(-) Additional abstraction layer (harder to debug provider-specific issues)
(-) Version pinning challenges (litellm moves fast)
(-) Must work around LiteLLM bugs (explicit version pin >=1.74.3 "fixes known bugs")
(-) Custom function call conversion needed anyway for non-native models

5. Text-Based Function Calling as Fallback

Decision: For models without native function calling, inject an XML-like format (<function=name><parameter=key>value</parameter></function>) into the system prompt.

Tradeoff:

(+) Enables tool use on any text-generation model
(+) In-context learning examples teach the format reliably
(-) Consumes prompt tokens for format description + examples
(-) Regex parsing is fragile (requires _fix_stopword(), _normalize_parameter_tags())
(-) One function call per message limitation (reduces efficiency)

6. Prompt Caching via Content Flags

Decision: Mark specific messages with cache_enabled = True and rely on provider-side caching (Anthropic's ephemeral cache).

Tradeoff:

(+) Significant cost reduction for repeated prefixes (system prompt, early conversation)
(+) No local cache management needed
(-) Provider-specific (only works with Anthropic and select models)
(-) Cache hit/miss tracking adds complexity to metrics
(-) Summarization condenser disables caching (caching_prompt=False) because summaries are write-once

What's Unusual or Clever

1. Self-Assessed Security Risk

Every tool that can modify state has a security_risk parameter that the LLM must fill in (LOW/MEDIUM/HIGH). The system prompt defines what each level means. This is clever because:

The LLM self-assesses risk before each action
The controller can override or block based on the assessment
It creates an audit trail of risk decisions
It shifts the burden from heuristic rules to contextual judgment

File: openhands/agenthub/codeact_agent/prompts/security_risk_assessment.j2

The risk definitions change based on context (CLI mode vs. sandbox mode), recognizing that the same action has different risk profiles depending on the execution environment.

2. Temperature Perturbation on Empty Responses

File: openhands/llm/retry_mixin.py:46-60

When the LLM returns no response and temperature is 0, the retry logic temporarily sets temperature to 1.0. This breaks out of deterministic empty-response loops without permanently changing the model's behavior.

3. Dynamic In-Context Learning Examples

File: openhands/llm/fn_call_converter.py:326-392

For models without native function calling, the system generates tool-usage examples dynamically based on which tools are actually enabled. If the agent doesn't have browser access, the browser example is omitted. This prevents the model from trying to use unavailable tools.

4. Condensation with Task Tracking Preservation

File: openhands/memory/condenser/impl/llm_summarizing_condenser.py

When the condenser summarizes old events, it explicitly instructs the summarization LLM to preserve task tracker IDs and statuses. This ensures that task state survives memory compression -- a subtle but important detail for long-running workflows.

The summarization prompt uses a structured template:

USER_CONTEXT: ...
TASK_TRACKING: {task IDs, statuses}  ← MUST be preserved
COMPLETED: ...
PENDING: ...
CODE_STATE: {files, functions, structures}
TESTS: {failing cases, error messages}
VERSION_CONTROL_STATUS: {branch, PR, commits}

5. Pending Action Queue in CodeActAgent

File: openhands/agenthub/codeact_agent/codeact_agent.py:170-175

When the LLM returns multiple tool calls in a single response, the agent queues them in pending_actions (a deque). On subsequent step() calls, it returns queued actions without calling the LLM again. This amortizes LLM latency across multiple actions.

6. Event-Driven Memory Recall

Rather than loading all microagent knowledge upfront (which would consume tokens), the memory system uses a pull-based approach:

First user message triggers RecallAction(WORKSPACE_CONTEXT) -- loads repo info and matching microagents
Subsequent messages trigger RecallAction(KNOWLEDGE) -- loads only microagents matching keywords in the new message

This lazy loading prevents token waste on irrelevant knowledge.

7. Dual Serialization Strategy for Messages

File: openhands/core/message.py

Messages serialize differently depending on model capabilities:

List serializer (for models supporting structured content): Returns content as [{type: "text", text: "..."}] with cache_control and image support
String serializer (for constrained models): Concatenates text content with newlines

The serialization strategy is controlled by per-message flags (cache_enabled, vision_enabled, function_calling_enabled), allowing mixed strategies within a single conversation.

8. Linus Torvalds-Inspired Prompt Variant

File: openhands/agenthub/codeact_agent/prompts/system_prompt_tech_philosophy.j2

One of the system prompt variants embeds Linus Torvalds' engineering philosophy:

"Good taste" = eliminating special cases rather than handling them
"Never break userspace" = backward compatibility is sacred
Pragmatism over theoretical purity
Obsession with simplicity (max 3 levels of indentation)

This is used via system_prompt_filename in agent config, allowing teams to select the coding philosophy their agent follows. It includes a structured 5-layer problem decomposition framework and decision output formats.