← Tours

Designing a multi-agent review pipeline (Swisscheese)

Engineers building a meta-harness on top of coding agents — multiple reviewer agents on a single PR, with cross-agent verdict synthesis.

75 minutes · 8 stops

Tour — Designing a multi-agent review pipeline

The meta-harness vision: a writer agent produces a diff; multiple reviewer agents critique it in parallel; a fixer agent synthesizes the verdicts into a coherent set of changes. This tour walks the corpus for the patterns that make this fly.

Pacing

BlockTime
Concept · multi-agent-coordination15 min
Strix architecture drill-down15 min
Insight · LLM dedupe5 min
Insight · module-level mailbox5 min
Insight · markdown-as-skills5 min
Concept · agent-loop (skim)10 min
OpenHands v1 event-sourcing drill-down15 min
Insight · streaming early-stop5 min

The architecture you should converge on

After this tour you should be able to articulate:

  1. Topology. Writer → parallel reviewers → fixer, with an optional critic-of-critics step when reviewers conflict. Each reviewer runs in parallel; the fixer waits.
  2. Context flow. Each reviewer gets the diff, its role definition, and an optional shared findings ledger. No parent transcript. Fresh context per reviewer.
  3. Verdict shape. Structured, not prose. { status, line_refs, reasons }. The fixer reads the structure; the human reviewer reads the reasons.
  4. Dedupe. LLM-based, asking “are these the same root cause?” — not hashing.
  5. Persistence. Event-sourced for replay. A customer disputes verdict X → you reproduce exactly what reviewer X saw and reasoned.
  6. Reviewer plugin model. Each reviewer is a markdown file with role, methodology, output schema. Customers add reviewers via PR.

Decisions to make

Before you start coding, resolve:

  • Dedupe layer. Run after reviewers complete (cheap), or as reviewers append to a shared ledger (more sophisticated, more code)?
  • Verdict synthesis. Voting (cheap, auditable, arbitrary thresholds) or critic-of-critics (smarter, more expensive)?
  • Failure handling. A reviewer crashes — does the fixer wait, retry, or proceed without it?
  • Cost cap. Total budget across the pipeline. Don’t let a 6-reviewer pipeline at 90 turns each become 540 turns of billing.

Output

A whiteboard architecture diagram you can defend to a co-founder in 5 minutes, with explicit decisions on the six points above.

Itinerary

  1. Multi-agent coordination concept

    Three questions decide your topology — who blocks on whom, what the child sees, what flows back. Read first.

  2. Strix project

    The closest existing architecture to a multi-agent reviewer pipeline. Module-level mailbox, dedupe, parallel sub-agents.

  3. LLM-based deduplication that reasons about root cause insight

    When five reviewers flag the same broken line, you do not want five line items. LLM-based dedupe is worth the tokens.

  4. Inter-agent messaging via module-level dicts insight

    An in-process findings ledger needs no Redis. Plain Python dicts. Cheapest thing that works.

  5. Markdown-as-prompt-library architecture insight

    Each reviewer is a markdown file. Customers extend the harness by writing markdown, not code. The moat.

  6. Agent loop concept

    Inside each reviewer is one of the four agent-loop containers. Pick deliberately.

  7. OpenHands (v1) project

    Event-sourced state for replay/audit. Useful when a customer disputes a verdict.

  8. Streaming early stop on </function> insight

    A free 10–20% cost win that compounds across many reviewer steps per PR.