ML Intern - Codebase Analysis

Analysis of the ML Intern repository (~17,000 lines of code across Python backend, React frontend, and agent tooling).

Documents

#	Document	What You'll Learn
01	Overview	What the project does, who it's for, complete tech stack and why each choice was made
02	Architecture	Component diagram, queue-based async design, data flow diagrams (message lifecycle, SSE streaming, session lifecycle, context management, tool approval)
03	Agent Core	The think-act loop, context management and compaction, session state, doom loop detection, LLM parameter routing, model catalog
04	Tools	Complete catalog of 17+ tools, sandbox/code execution, research sub-agent, HuggingFace and GitHub integrations, tool registration system
05	LLM Usage	How LLMs are called (3 contexts), model routing, system prompt evolution (V1->V2->V3), prompting techniques, 12 guardrail mechanisms
06	Backend & Frontend	Full API surface (18 endpoints), SSE streaming pipeline, OAuth flow, React component tree, Zustand state management, research visualization
07	Key Files	Tiered file importance map (essential/important/supporting), prompt locations, design patterns summary, notable tradeoffs
08	Comparison vs General Agents	What distinguishes ML Intern from Claude Code and similar agents, where each wins, when to use which

Quick orientation: Start with 01-overview.md then 02-architecture.md.

Understanding the agent: Read 03-agent-core.md then 05-llm-usage.md.

Understanding the tools: Read 04-tools.md.

Understanding the UI: Read 06-backend-frontend.md.

Quick reference: Use 07-key-files.md as an index when navigating the codebase.