02 - Architecture
ml-intern
02 - Architecture
High-Level Architecture
+-----------------------+
| User Interfaces |
+-----------+-----------+
|
+----------------+----------------+
| |
+-------v-------+ +---------v--------+
| CLI (Rich) | | Web UI (React) |
| agent/main.py| | frontend/src/ |
+-------+-------+ +---------+--------+
| |
| submission_queue | POST /api/chat/{id}
| event_queue | SSE events
| |
+-------v---------------------------------v--------+
| Agent Core |
| |
| +-------------+ +---------------+ +----------+ |
| | agent_loop | | ContextManager| | Session | |
| | (think/act) | | (history, | | (state, | |
| | | | compaction, | | events, | |
| | | | system prompt)| | cancel) | |
| +------+------+ +---------------+ +----------+ |
| | |
| +------v------+ +---------------+ |
| | ToolRouter | | DoomLoop | |
| | (dispatch) | | (detection) | |
| +------+------+ +---------------+ |
+---------|----------------------------------------+
|
+------------+------------+
| |
+-----v------+ +------v------+
| Built-in | | MCP Tools |
| Tools (17) | | (HF MCP) |
+-----+------+ +------+------+
| |
+-----v-------------------------v-----+
| External Services |
| |
| HuggingFace Hub GitHub API |
| HF Inference Semantic Scholar|
| HF Datasets Server arXiv/ar5iv |
| HF Spaces (sandbox) |
+-------------------------------------+
Queue-Based Async Architecture
The core insight of the architecture is the decoupling of UI from agent logic via two async queues. This is explicitly modeled after Anthropic's Codex architecture (referenced in code comments).
User Input Display
| ^
v |
+-------------+ Operations +----------------+ Events +-----------+
| submission | ───────────────> | submission_loop| ────────> | event |
| _queue | | (agent_loop.py)| | _queue |
+-------------+ +----------------+ +-----------+
Operations: USER_INPUT, EXEC_APPROVAL, INTERRUPT, UNDO, COMPACT, SHUTDOWN
Events: ready, assistant_chunk, tool_call, tool_output, approval_required,
turn_complete, error, interrupted, compacted, tool_log, plan_update,
tool_state_change, processing, shutdown
Why queues? This design enables:
- Both CLI and Web frontends to share the same core loop
- Non-blocking UI -- the display never waits for the agent
- Clean cancellation --
INTERRUPTis just another queue message - Multiple SSE subscribers for the same session (web broadcast pattern)
Component Responsibilities
Agent Core (agent/core/)
| Component | File | Responsibility |
|---|---|---|
| Agent Loop | agent_loop.py |
The think-act cycle: call LLM, parse tool calls, check approvals, execute tools, repeat |
| Session | session.py |
Per-session state: context manager, tool router, cancellation, event logging, trajectory persistence |
| ToolRouter | tools.py |
Tool registration, dispatch (built-in + MCP), schema generation for LLM |
| ContextManager | context_manager/manager.py |
Message history, system prompt loading, context window tracking, compaction (summarization), dangling tool call repair |
| DoomLoop | doom_loop.py |
Detects repetitive tool call patterns, injects corrective prompts |
| LLM Params | llm_params.py |
Builds litellm kwargs per model type (Anthropic direct, OpenAI direct, HF Router) |
| Session Uploader | session_uploader.py |
Fire-and-forget trajectory upload to HF dataset repo |
| HF Router Catalog | hf_router_catalog.py |
Model catalog fetch/cache for validation and context window discovery |
| Config | config.py |
Pydantic config with env var substitution |
Backend (backend/)
| Component | File | Responsibility |
|---|---|---|
| App | main.py |
FastAPI app, CORS, routers, static file serving |
| Agent Routes | routes/agent.py |
REST + SSE endpoints for sessions, chat, approval, model config |
| Auth Routes | routes/auth.py |
HuggingFace OAuth 2.0 flow |
| Session Manager | session_manager.py |
Multi-session orchestration, EventBroadcaster fan-out, capacity management |
| Dependencies | dependencies.py |
Auth middleware, token validation, dev mode bypass |
| Models | models.py |
Pydantic request/response schemas |
Frontend (frontend/src/)
| Component | File(s) | Responsibility |
|---|---|---|
| SSE Transport | lib/sse-chat-transport.ts |
Custom ChatTransport bridging backend SSE to Vercel AI SDK |
| Agent Chat Hook | hooks/useAgentChat.ts |
Per-session chat state, side-channel event processing, reconnection |
| Session Store | store/sessionStore.ts |
Session list CRUD, active session tracking (persisted) |
| Agent Store | store/agentStore.ts |
Per-session processing state, research agents, panels, plans |
| Layout Store | store/layoutStore.ts |
Sidebar, panel, theme state |
| Message Persistence | lib/chat-message-store.ts |
localStorage for UIMessages (max 50 sessions) |
| Research Persistence | lib/research-store.ts |
localStorage for research sub-agent state |
| Message Converter | lib/convert-llm-messages.ts |
Backend litellm format -> Vercel AI SDK UIMessage format |
Data Flow: User Message to Agent Response
sequenceDiagram
participant U as User
participant UI as Frontend/CLI
participant Q as Submission Queue
participant AL as Agent Loop
participant CM as Context Manager
participant LLM as LLM Provider
participant TR as Tool Router
participant T as Tools
participant EQ as Event Queue
U->>UI: "Fine-tune Llama on this dataset"
UI->>Q: Submission(USER_INPUT, text)
Q->>AL: dequeue
AL->>CM: add_message(user, text)
loop Agent Loop (max 300 iterations)
AL->>CM: get_messages()
CM-->>AL: [system, ...history]
AL->>AL: check doom_loop(messages)
AL->>LLM: acompletion(messages, tools)
LLM-->>AL: response (streamed chunks)
AL->>EQ: assistant_chunk events
AL->>CM: add_message(assistant, content + tool_calls)
alt Has tool calls
AL->>AL: _needs_approval(tool_calls)?
alt Needs approval
AL->>EQ: approval_required event
Note over AL: Pauses, waits for EXEC_APPROVAL
end
AL->>TR: call_tool(name, args) [parallel]
TR->>T: dispatch to handler
T-->>TR: (output, success)
TR-->>AL: results
AL->>EQ: tool_output events
AL->>CM: add_message(tool, result) for each
else No tool calls (text only)
AL->>EQ: turn_complete
Note over AL: Done, wait for next submission
end
end
EQ->>UI: stream events
UI->>U: render responseData Flow: SSE Streaming (Web UI)
Frontend Backend Agent Core
| | |
| POST /api/chat/{id} | |
| { text: "..." } | |
|------------------------------->| |
| | subscribe(EventBroadcaster) |
| | submit(USER_INPUT) |
| |------------------------------>|
| | |
| SSE: data: {"event_type": | Event Queue |
| "assistant_chunk",...} |<------------------------------|
|<-------------------------------| |
| | |
| (pipeline: | |
| response.body | |
| -> TextDecoderStream | |
| -> SSEParserStream | |
| -> EventToChunkStream | |
| -> Vercel AI SDK) | |
| | |
| SSE: data: {"event_type": | |
| "turn_complete",...} |<------------------------------|
|<-------------------------------| |
| | |
| Stream closes | unsubscribe |
Session Lifecycle
POST /api/session
|
v
+------+------+
| SessionMgr |
| create() |
| capacity ck |
+------+------+
|
+-----------+-----------+
| |
+-----v-----+ +------v------+
| Session | | ToolRouter |
| (UUID, | | (built-in + |
| context, | | MCP tools) |
| config) | +------+------+
+-----+------+ |
| |
+-------+-------+-------+
|
+-----v-----+
| _run_ |
| session() | <-- asyncio task
| loop |
+-----+------+
|
+-------------+-------------+
| |
+-----v------+ +------v------+
| submission | | Event |
| _queue | | Broadcaster |
| (reads ops) | | (fans out) |
+-------------+ +------+------+
|
+-----+------+
| SSE subs |
| (per-tab) |
+------------+
Termination triggers:
- SHUTDOWN operation
- DELETE /api/session/{id}
- Unhandled exception (emergency save)
Context Window Management
+----------------------------------------------------------+
| Context Window |
| |
| +--------+ +----------+ +------ ... ------+ +------+ |
| | System | | 1st User | | Middle msgs | |Recent| |
| | Prompt | | Message | | (compactable) | | 5 | |
| +--------+ +----------+ +-----------------+ +------+ |
| |
| Max: model_limit - 10,000 (safety margin) |
| Compact at: max_context exceeded |
| Summary budget: 10% of max_context |
+----------------------------------------------------------+
Compaction strategy:
1. System prompt: ALWAYS preserved
2. First user message: ALWAYS preserved (original task)
3. Middle messages: Summarized by the LLM itself
4. Last 5 messages: ALWAYS preserved (recent context)
5. Result: [system] + [first_user] + [summary] + [recent_5]
Tool Approval Flow
Agent Loop: tool_call detected
|
v
_needs_approval(tool)?
|
+---+---+
| |
No Yes
| |
v v
Execute Emit approval_required event
| |
| +------v---------+
| | CLI: prompt |
| | Web: inline |
| | approval UI |
| +------+---------+
| |
| +------v---------+
| | User decision: |
| | approve/reject |
| | /yolo/feedback |
| +------+---------+
| |
| Submit EXEC_APPROVAL
| |
v v
Results -> add to context -> continue loop
Multi-Session Web Architecture
Frontend renders ALL session components simultaneously:
+-----------------------------------------------------+
| AppLayout |
| +----------+ +-----------------------------------+ |
| | Session | | SessionChat (session_1) [ACTIVE] | |
| | Sidebar | | useAgentChat(session_1) running | |
| | | | renders: MessageList + ChatInput | |
| | - sess 1 | +-----------------------------------+ |
| | - sess 2 | | SessionChat (session_2) [HIDDEN] | |
| | - sess 3 | | useAgentChat(session_2) running | |
| | | | renders: null | |
| | | +-----------------------------------+ |
| | | | SessionChat (session_3) [HIDDEN] | |
| | | | useAgentChat(session_3) running | |
| | | | renders: null | |
| +----------+ +-----------------------------------+ |
+-----------------------------------------------------+
Each session's useAgentChat hook runs continuously,
processing events even when not visible. Only the
active session renders UI components.