Tour — Designing a multi-agent review pipeline
The meta-harness vision: a writer agent produces a diff; multiple reviewer agents critique it in parallel; a fixer agent synthesizes the verdicts into a coherent set of changes. This tour walks the corpus for the patterns that make this fly.
Pacing
| Block | Time |
|---|---|
| Concept · multi-agent-coordination | 15 min |
| Strix architecture drill-down | 15 min |
| Insight · LLM dedupe | 5 min |
| Insight · module-level mailbox | 5 min |
| Insight · markdown-as-skills | 5 min |
| Concept · agent-loop (skim) | 10 min |
| OpenHands v1 event-sourcing drill-down | 15 min |
| Insight · streaming early-stop | 5 min |
The architecture you should converge on
After this tour you should be able to articulate:
- Topology. Writer → parallel reviewers → fixer, with an optional critic-of-critics step when reviewers conflict. Each reviewer runs in parallel; the fixer waits.
- Context flow. Each reviewer gets the diff, its role definition, and an optional shared findings ledger. No parent transcript. Fresh context per reviewer.
- Verdict shape. Structured, not prose.
{ status, line_refs, reasons }. The fixer reads the structure; the human reviewer reads the reasons. - Dedupe. LLM-based, asking “are these the same root cause?” — not hashing.
- Persistence. Event-sourced for replay. A customer disputes verdict X → you reproduce exactly what reviewer X saw and reasoned.
- Reviewer plugin model. Each reviewer is a markdown file with role, methodology, output schema. Customers add reviewers via PR.
Decisions to make
Before you start coding, resolve:
- Dedupe layer. Run after reviewers complete (cheap), or as reviewers append to a shared ledger (more sophisticated, more code)?
- Verdict synthesis. Voting (cheap, auditable, arbitrary thresholds) or critic-of-critics (smarter, more expensive)?
- Failure handling. A reviewer crashes — does the fixer wait, retry, or proceed without it?
- Cost cap. Total budget across the pipeline. Don’t let a 6-reviewer pipeline at 90 turns each become 540 turns of billing.
Output
A whiteboard architecture diagram you can defend to a co-founder in 5 minutes, with explicit decisions on the six points above.
Itinerary
-
Multi-agent coordination concept
Three questions decide your topology — who blocks on whom, what the child sees, what flows back. Read first.
-
Strix project
The closest existing architecture to a multi-agent reviewer pipeline. Module-level mailbox, dedupe, parallel sub-agents.
-
LLM-based deduplication that reasons about root cause insight
When five reviewers flag the same broken line, you do not want five line items. LLM-based dedupe is worth the tokens.
-
Inter-agent messaging via module-level dicts insight
An in-process findings ledger needs no Redis. Plain Python dicts. Cheapest thing that works.
-
Markdown-as-prompt-library architecture insight
Each reviewer is a markdown file. Customers extend the harness by writing markdown, not code. The moat.
-
Agent loop concept
Inside each reviewer is one of the four agent-loop containers. Pick deliberately.
-
OpenHands (v1) project
Event-sourced state for replay/audit. Useful when a customer disputes a verdict.
-
Streaming early stop on </function> insight
A free 10–20% cost win that compounds across many reviewer steps per PR.