ML Intern - Codebase Analysis
ml-intern
ML Intern - Codebase Analysis
Analysis of the ML Intern repository (~17,000 lines of code across Python backend, React frontend, and agent tooling).
Documents
| # | Document | What You'll Learn |
|---|---|---|
| 01 | Overview | What the project does, who it's for, complete tech stack and why each choice was made |
| 02 | Architecture | Component diagram, queue-based async design, data flow diagrams (message lifecycle, SSE streaming, session lifecycle, context management, tool approval) |
| 03 | Agent Core | The think-act loop, context management and compaction, session state, doom loop detection, LLM parameter routing, model catalog |
| 04 | Tools | Complete catalog of 17+ tools, sandbox/code execution, research sub-agent, HuggingFace and GitHub integrations, tool registration system |
| 05 | LLM Usage | How LLMs are called (3 contexts), model routing, system prompt evolution (V1->V2->V3), prompting techniques, 12 guardrail mechanisms |
| 06 | Backend & Frontend | Full API surface (18 endpoints), SSE streaming pipeline, OAuth flow, React component tree, Zustand state management, research visualization |
| 07 | Key Files | Tiered file importance map (essential/important/supporting), prompt locations, design patterns summary, notable tradeoffs |
| 08 | Comparison vs General Agents | What distinguishes ML Intern from Claude Code and similar agents, where each wins, when to use which |
Reading Order
Quick orientation: Start with 01-overview.md then 02-architecture.md.
Understanding the agent: Read 03-agent-core.md then 05-llm-usage.md.
Understanding the tools: Read 04-tools.md.
Understanding the UI: Read 06-backend-frontend.md.
Quick reference: Use 07-key-files.md as an index when navigating the codebase.