06 - Backend API & Frontend UI
Backend (FastAPI)
Application Setup (backend/main.py)
- FastAPI app with CORS for
localhost:5173andlocalhost:3000(Vite dev servers) - Two routers:
/api(agent) and/auth(OAuth) - In production, serves built frontend from
../static/as SPA - Port 7860 (HF Spaces default)
Complete API Surface (backend/routes/agent.py)
| Endpoint | Method | Auth | Purpose |
|---|---|---|---|
GET /api/health |
GET | No | Health check (active sessions, max sessions) |
GET /api/health/llm |
GET | No | LLM provider reachability (1-token probe) |
GET /api/config/model |
GET | No | Current model + available models list |
POST /api/config/model |
POST | Yes | Set global default LLM model |
POST /api/title |
POST | Yes | Generate 6-word session title from first message |
POST /api/session |
POST | Yes | Create new agent session |
GET /api/session/{id} |
GET | Yes | Get session info (processing state, message count) |
POST /api/session/{id}/model |
POST | Yes | Set per-session model (tab-scoped) |
GET /api/sessions |
GET | Yes | List user's sessions |
DELETE /api/session/{id} |
DELETE | Yes | Delete session |
POST /api/chat/{id} |
POST | Yes | Primary SSE endpoint: submit + stream |
GET /api/events/{id} |
GET | Yes | Subscribe to events (reconnection) |
POST /api/interrupt/{id} |
POST | Yes | Interrupt agent loop |
GET /api/session/{id}/messages |
GET | Yes | Full message history |
POST /api/undo/{id} |
POST | Yes | Undo last turn |
POST /api/truncate/{id} |
POST | Yes | Truncate to before Nth user message |
POST /api/compact/{id} |
POST | Yes | Trigger context compaction |
POST /api/shutdown/{id} |
POST | Yes | Shutdown session |
Available models (hardcoded in agent.py:39-62):
- Claude Opus 4.6 (Anthropic) -- recommended
- MiniMax M2.7 (HuggingFace) -- recommended
- Kimi K2.6 (HuggingFace)
- GLM 5.1 (HuggingFace)
Primary SSE Endpoint: POST /api/chat/{id} (line 319)
This is the main interaction endpoint. It:
- Subscribes to the session's EventBroadcaster before submitting (ensures no events are missed)
- Accepts either
{ text: "..." }(user input) or{ approvals: [...] }(tool approvals) - Returns an SSE stream (
text/event-stream) - Keepalive every 15 seconds (SSE comment
": keepalive\n\n") - Stream terminates on:
turn_complete,approval_required,error,interrupted,shutdown X-Accel-Buffering: noheader for nginx proxy compatibility
Session Manager (backend/session_manager.py)
Capacity limits: 200 total sessions, 10 per user.
EventBroadcaster (line 41): Fan-out pattern. One source queue (from agent core) to N subscriber queues (one per SSE connection). Each subscriber gets its own asyncio.Queue.
_run_session() (line 221): Per-session asyncio task that:
- Creates the EventBroadcaster
- Loops reading from submission_queue with 1s timeout
- Calls
process_submission()from agent core - Sets
is_processingflag around each submission - Cleans up sandbox on exit
Thread safety: Session creation uses asyncio.Lock for capacity checking. Session/ToolRouter constructors run via asyncio.to_thread() since they may block.
Authentication (backend/routes/auth.py, backend/dependencies.py)
OAuth 2.0 Authorization Code Flow:
Browser Backend HuggingFace
| | |
| GET /auth/login | |
|----------------------->| |
| | Generate CSRF state |
| 302 Redirect | |
|<-----------------------| |
| | |
| GET /authorize | |
|------------------------------------------------->|
| | |
| 302 Callback | |
|<-------------------------------------------------|
| | |
| GET /auth/callback?code=... |
|----------------------->| |
| | POST /token (exchange) |
| |------------------------>|
| | { access_token } |
| |<------------------------|
| | GET /userinfo |
| |------------------------>|
| | { user data } |
| |<------------------------|
| Set-Cookie: hf_access_token (HttpOnly, 7d) |
|<-----------------------| |
Scopes: openid profile read-repos write-repos contribute-repos manage-repos inference-api jobs write-discussions
Dev mode: When OAUTH_CLIENT_ID is not set, auth is bypassed entirely. get_current_user() returns DEV_USER with user_id: "dev".
Token caching: Validated tokens cached for 5 minutes in-memory (dependencies.py:21).
Frontend (React + TypeScript)
Application Structure
frontend/src/
App.tsx # Root: auth check + layout
main.tsx # Entry: theme + providers
theme.ts # MUI dark/light themes
hooks/
useAgentChat.ts # Per-session chat orchestration (743 lines)
useAuth.ts # Authentication state
useOrgMembership.ts # Org membership polling
lib/
sse-chat-transport.ts # Custom SSE -> AI SDK transport (397 lines)
chat-message-store.ts # localStorage message persistence
research-store.ts # localStorage research state persistence
convert-llm-messages.ts # Backend format -> AI SDK format
store/
sessionStore.ts # Session list (Zustand, persisted)
agentStore.ts # Per-session state (Zustand)
layoutStore.ts # Layout preferences (Zustand, partial persist)
components/
Layout/AppLayout.tsx # Main layout with sidebar + panel
SessionChat.tsx # Per-session chat (active vs hidden)
WelcomeScreen/ # Onboarding checklist
Chat/
ChatInput.tsx # Input + model selector
MessageList.tsx # Scrollable message list
AssistantMessage.tsx # Message with grouped tool calls
ToolCallGroup.tsx # Tool approval + research display (1117 lines)
ActivityStatusBar.tsx # Animated status indicator
CodePanel/
CodePanel.tsx # Right panel for scripts/output/plans
SessionSidebar/
SessionSidebar.tsx # Session list with create/delete
SSE Streaming Pipeline (lib/sse-chat-transport.ts)
The custom SSEChatTransport bridges backend SSE events to the Vercel AI SDK's UIMessageChunk streaming interface:
POST /api/chat/{id}
|
v
response.body (ReadableStream<Uint8Array>)
|
v
TextDecoderStream
|
v
createSSEParserStream() -- parses "data: {...}\n\n" into AgentEvent objects
|
v
createEventToChunkStream() -- maps AgentEvent -> UIMessageChunk
|
v
Vercel AI SDK (useChat) -- renders into React state
Event mapping (sse-chat-transport.ts:78-269):
| Backend Event | AI SDK Chunk(s) | Side Effect |
|---|---|---|
ready |
(none) | onReady() callback |
processing |
start + start-step |
onProcessing() |
assistant_chunk |
text-start (first) + text-delta |
Updates streaming state |
assistant_stream_end |
text-end |
Marks text complete |
tool_call |
tool-input-start + tool-input-available |
onToolCallPanel() |
tool_output |
tool-output-available or tool-output-error |
onToolOutputPanel() |
approval_required |
tool-input-start + tool-approval-request |
onApprovalRequired() |
tool_state_change |
(stores job URL/status) | State update |
turn_complete |
finish-step + finish(stop) |
Clears processing |
error |
finish-step + finish(error) |
Shows error |
interrupted |
finish-step + finish(stop) |
Marks cancelled |
Per-Session Chat Hook (hooks/useAgentChat.ts)
This 743-line hook is the core frontend orchestrator. Key responsibilities:
Side-channel callbacks (line 44-299): A useMemo block creating callbacks that update per-session state via agentStore.updateSession(). Handles:
- Research sub-agent state (
onToolLog): Parsestool_logevents to track per-agent progress, tool counts, token counts, elapsed time - Approval panel data (
onApprovalRequired): Builds script preview or JSON display for the CodePanel - Tool call panel (
onToolCallPanel): Shows running tool's script/args in the panel - Tool output panel (
onToolOutputPanel): Updates panel with results
Backend hydration (line 360-425): On mount, fetches full message history from /api/session/{id}/messages, converts to UIMessages via llmMessagesToUIMessages(), and restores pending approval state.
Wake-from-sleep reconnection (line 435-608): On visibilitychange, re-hydrates messages, subscribes to GET /api/events/{id} for live SSE, and polls messages every 3 seconds for sync.
Key actions:
handleSendMessage: Submit text -> set processing state -> auto-title from first messageundoLastTurn: REST call to/api/undo/{id}+ remove last turn from UIapproveTools: Store edited scripts -> send approval responses via AI SDKstop: POST/api/interrupt/{id}(keeps SSE open for remaining events)editAndRegenerate: Truncate backend + frontend, re-send edited text
State Management (3 Zustand Stores)
sessionStore (persisted to localStorage)
- State:
sessions: SessionMeta[],activeSessionId: string | null - Actions:
createSession,deleteSession,switchSession,updateSessionTitle,setNeedsAttention - Key:
hf-agent-sessions
agentStore (not persisted, except specific fields)
- State: Per-session state map + mirrored flat fields for active session
- Per-session:
isProcessing,activityStatus,panelData,panelView,plan,researchAgents,researchSteps,researchStats - Pattern:
updateSession(sessionId, updates)patches session entry AND mirrors to flat fields if active - Persisted fields:
editedScripts,jobUrls,jobStatuses,toolErrors,rejectedTools(to localStorage)
layoutStore (partially persisted)
- State:
isLeftSidebarOpen,isRightPanelOpen,rightPanelWidth,themeMode - Persisted: Only
themeMode
Research Sub-Agent Visualization
The frontend tracks parallel research agents in real-time:
+-------------------------------------------+
| research "Finding fine-tuning approach" |
| [running . 5 tools . 12.4k tokens . 18s]|
| |
| > Exploring HF docs: trl |
| > Reading paper: 2401.12345 |
| > Finding examples: sft training |
| > Inspecting dataset: mlabonne/... [*] |
+-------------------------------------------+
Data flow:
- Backend research tool sends
tool_logevents withagent_idandlabel useAgentChat.onToolLog()parses these intoResearchAgentStateentries in agentStoreToolCallGroup.tsxrenders per-agent stats chips and rolling step displaysuseSecondTick()hook forces re-render every second for live elapsed time- State persisted to localStorage via
research-store.tsfor page refresh survival
Tool Approval UI (components/Chat/ToolCallGroup.tsx)
The most complex UI component (1,117 lines). Handles:
- Batch approval: When multiple tools pending, "Approve all" / "Reject all" header
- Individual approval: Per-tool Approve/Reject with optional feedback
- Script preview: Click to view and edit scripts in CodePanel
- Hardware pricing: Shows GPU costs for
sandbox_createandhf_jobs - Edited script tracking: "(edited)" badge when user modifies a script before approval
- Auto-follow panel: Automatically shows the currently running tool in CodePanel; user can "lock" a specific tool
Code Panel (components/CodePanel/CodePanel.tsx)
Right-side panel with:
- Script/Output toggle: Switch between input and output views
- Inline editing: Overlay textarea on syntax-highlighted code
- Syntax highlighting: Python via
react-syntax-highlighter, Markdown viareact-markdown - Log processing: Cleans up progress bars and download lines (
utils/logProcessor.ts) - Plan display: Bottom section with status icons (completed, in_progress, pending)
- Drag-to-resize: Desktop: inline panel (min 300px, max 60% viewport). Mobile: bottom drawer at 75vh
Message Rendering Pipeline
Backend LLM messages (litellm format)
|
v
llmMessagesToUIMessages() -- convert-llm-messages.ts
(consecutive assistant msgs merged, tool results paired)
|
v
UIMessage[] (Vercel AI SDK format)
|
v
MessageList.tsx
(auto-scroll, MutationObserver for streaming)
|
v
UserMessage / AssistantMessage
|
v
groupParts() -- groups consecutive tools
|
v
MarkdownContent / ToolCallGroup
Wake-from-Sleep Reconnection
When the browser tab becomes visible after sleeping:
- Fetch
/api/session/{id}/messages-- get full backend state - Convert to UIMessages and reconcile with local state
- If
is_processing: subscribe toGET /api/events/{id}for live SSE - Start polling messages every 3 seconds for sync
- On
turn_completeor similar terminal event: stop polling, close SSE
This handles the case where the agent was working while the tab was asleep.
CLI Interface (agent/main.py)
Startup Sequence
1. Particle logo animation (braille characters converge to form text)
2. Screen clear
3. CRT boot sequence (typewriter + glitch + scanlines):
- "User: {hf_username}"
- "Model: {model_name}"
- "Tools: loading..."
4. Tool count overwrite (ANSI cursor-up, types actual count)
5. Ready for input
Terminal Display System (agent/utils/terminal_display.py)
Three rendering layers:
- Rich layer: Themed console (
_THEMEwith warm gold accents), markdown rendering, panels - ANSI escape layer: Direct cursor manipulation for
SubAgentDisplayManagerlive regions and init-done animation - Typewriter layer: Async character-by-character rendering with variable timing (2ms newlines, 4ms chars, occasional 15ms pauses)
SubAgentDisplayManager (line 170-314): Manages multiple concurrent sub-agent displays using terminal escape codes. Shows at most 4 tool-call lines per agent, with compact mode when multiple agents are active. Redraws every 1 second to update elapsed timers.
Slash Commands
| Command | Action |
|---|---|
/help |
Show available commands |
/undo |
Remove last turn from conversation |
/compact |
Trigger context compaction |
/model <id> |
Switch LLM model (with preflight validation) |
/yolo |
Toggle auto-approval mode |
/effort <level> |
Set reasoning effort (low/medium/high) |
/status |
Show session info (model, messages, context) |
/quit |
Exit |