CodeDocs Vault

ML Intern - Codebase Analysis

Analysis of the ML Intern repository (~17,000 lines of code across Python backend, React frontend, and agent tooling).

Documents

# Document What You'll Learn
01 Overview What the project does, who it's for, complete tech stack and why each choice was made
02 Architecture Component diagram, queue-based async design, data flow diagrams (message lifecycle, SSE streaming, session lifecycle, context management, tool approval)
03 Agent Core The think-act loop, context management and compaction, session state, doom loop detection, LLM parameter routing, model catalog
04 Tools Complete catalog of 17+ tools, sandbox/code execution, research sub-agent, HuggingFace and GitHub integrations, tool registration system
05 LLM Usage How LLMs are called (3 contexts), model routing, system prompt evolution (V1->V2->V3), prompting techniques, 12 guardrail mechanisms
06 Backend & Frontend Full API surface (18 endpoints), SSE streaming pipeline, OAuth flow, React component tree, Zustand state management, research visualization
07 Key Files Tiered file importance map (essential/important/supporting), prompt locations, design patterns summary, notable tradeoffs
08 Comparison vs General Agents What distinguishes ML Intern from Claude Code and similar agents, where each wins, when to use which

Reading Order

Quick orientation: Start with 01-overview.md then 02-architecture.md.

Understanding the agent: Read 03-agent-core.md then 05-llm-usage.md.

Understanding the tools: Read 04-tools.md.

Understanding the UI: Read 06-backend-frontend.md.

Quick reference: Use 07-key-files.md as an index when navigating the codebase.