Engineers building a meta-harness on top of coding agents — multiple reviewer agents on a single PR, with cross-agent verdict synthesis.

Tour — Designing a multi-agent review pipeline

The meta-harness vision: a writer agent produces a diff; multiple reviewer agents critique it in parallel; a fixer agent synthesizes the verdicts into a coherent set of changes. This tour walks the corpus for the patterns that make this fly.

Pacing

Block	Time
Concept · multi-agent-coordination	15 min
Strix architecture drill-down	15 min
Insight · LLM dedupe	5 min
Insight · module-level mailbox	5 min
Insight · markdown-as-skills	5 min
Concept · agent-loop (skim)	10 min
OpenHands v1 event-sourcing drill-down	15 min
Insight · streaming early-stop	5 min

The architecture you should converge on

After this tour you should be able to articulate:

Topology. Writer → parallel reviewers → fixer, with an optional critic-of-critics step when reviewers conflict. Each reviewer runs in parallel; the fixer waits.
Context flow. Each reviewer gets the diff, its role definition, and an optional shared findings ledger. No parent transcript. Fresh context per reviewer.
Verdict shape. Structured, not prose. { status, line_refs, reasons }. The fixer reads the structure; the human reviewer reads the reasons.
Dedupe. LLM-based, asking “are these the same root cause?” — not hashing.
Persistence. Event-sourced for replay. A customer disputes verdict X → you reproduce exactly what reviewer X saw and reasoned.
Reviewer plugin model. Each reviewer is a markdown file with role, methodology, output schema. Customers add reviewers via PR.

Decisions to make

Before you start coding, resolve:

Dedupe layer. Run after reviewers complete (cheap), or as reviewers append to a shared ledger (more sophisticated, more code)?
Verdict synthesis. Voting (cheap, auditable, arbitrary thresholds) or critic-of-critics (smarter, more expensive)?
Failure handling. A reviewer crashes — does the fixer wait, retry, or proceed without it?
Cost cap. Total budget across the pipeline. Don’t let a 6-reviewer pipeline at 90 turns each become 540 turns of billing.

Output

A whiteboard architecture diagram you can defend to a co-founder in 5 minutes, with explicit decisions on the six points above.

Itinerary

Multi-agent coordination concept

Three questions decide your topology — who blocks on whom, what the child sees, what flows back. Read first.
Strix project

The closest existing architecture to a multi-agent reviewer pipeline. Module-level mailbox, dedupe, parallel sub-agents.
LLM-based deduplication that reasons about root cause insight

When five reviewers flag the same broken line, you do not want five line items. LLM-based dedupe is worth the tokens.
Inter-agent messaging via module-level dicts insight

An in-process findings ledger needs no Redis. Plain Python dicts. Cheapest thing that works.
Markdown-as-prompt-library architecture insight

Each reviewer is a markdown file. Customers extend the harness by writing markdown, not code. The moat.
Agent loop concept

Inside each reviewer is one of the four agent-loop containers. Pick deliberately.
OpenHands (v1) project

Event-sourced state for replay/audit. Useful when a customer disputes a verdict.
Streaming early stop on </function> insight

A free 10–20% cost win that compounds across many reviewer steps per PR.

Designing a multi-agent review pipeline (Swisscheese)

Tour — Designing a multi-agent review pipeline

Pacing

The architecture you should converge on

Decisions to make

Output

Itinerary

Multi-agent coordination concept

Strix project

LLM-based deduplication that reasons about root cause insight

Inter-agent messaging via module-level dicts insight

Markdown-as-prompt-library architecture insight

Agent loop concept

OpenHands (v1) project

Streaming early stop on </function> insight