Lessons For Swisscheese

What a meta-harness for AI code review should learn from Superset.

Swisscheese is a meta-harness solving the code review problem of AI-generated code by scaling up agents (ensemble review). Superset solves a different but adjacent problem — orchestrating ensemble generation — so most of its lessons transfer with adaptation. This doc separates the lifts, the adapts, and the skips.

Confidence note: this is reasoning by analogy. Where Swisscheese's actual constraints differ from what I'm assuming, the priorities will shift.

1. Lift directly

1.1 Per-agent prompt templates with dialects

Same LaunchSource[] (the diff, the PR description, the failing tests, the previous review threads) renders differently per reviewer model — XML for Claude, markdown for Codex/Cursor. Mustache templates per agent, not branching code.

Source: packages/shared/src/agent-prompt-template.ts
Plan: plans/done/v2-workspace-context-composition.md

Adding a new reviewer model = adding a template + a preset row. No code branching on if (model === "claude").

1.2 Structured context with `scope: "system" | "user"` + cache hints

Critical for Swisscheese specifically: the system prompt and style guide are constant across reviewers; the diff changes per PR; the agent role (security, perf, API stability) varies per fan-out. Tag each section with cache_control so Anthropic's prompt cache hits across the swarm. This is the single biggest cost lever you have.

Pattern: ContextSection[] with scope, kind, label, content[], cacheControl, meta
Result: 50KB style-guide that ships with every PR review goes from "expensive" to "negligible" with proper tagging

1.3 Event-sourced review log + pure reducer

plans/v2-chat-greenfield-architecture.md is the playbook. One log per PR review, monotonic seq, applyEvent(state, event) reducer on the client, gap detection on receive, replay endpoints.

Each reviewer agent is just an event producer; conflicting findings, deduplication, and human-override become trivial.

Build this from day one. Superset's V1 polling architecture (4 fps from two independent sources, racing on the client) is a cautionary tale about deferring it. Multi-device convergence and reproducible replay come for free.

1.4 Heredoc with random delimiter for prompt-as-shell-arg

Tiny utility, big payoff. (packages/shared/src/agent-prompt-launch.ts:26-68.) When you fan out to a claude/codex/cursor CLI you don't control, this is how you safely pass an arbitrary diff blob without quote-injection.

claude review "$(cat <<'D7M0KQ'
…arbitrary text with backticks, $vars, quotes…
D7M0KQ
)"

Delimiter derived from the prompt content + a random id so it can't appear inside.

1.5 Idempotency keys on every fan-out

buildIdempotencyKey(request) in the orchestrator. Fan-out + retries = duplicate reviews unless you key the dispatch. A retry of "review PR 1234 with reviewer-security-v2" must not produce two reviews.

1.6 Hook-based guardrails, not output filters

The UserPromptSubmit veto hook (packages/chat/src/server/trpc/utils/runtime/runtime.ts:135-144) is where you put:

Secret-redaction in the diff
Max-diff-size gate
"This PR touches X path — escalate to senior reviewer"
Tenant-specific compliance gates

Pluggable, replaceable per deployment, never baked into core. Hooks > filters because they compose.

1.7 Authority-by-ownership discipline

Write down — explicitly, in a doc like HOST_SERVICE_BOUNDARIES.md — which component owns each review-state question. Examples:

Question	Authority
"Is review complete?"	Aggregator + human, never any single agent
"Can a finding be dismissed?"	Finding author OR human, never another agent
"What revision are we on?"	The repo (commit SHA), not the orchestrator's cache
"Which reviewer is canonical?"	Configured per-repo, not voted on per-run

Superset's biggest historical bugs were authority confusions ("ghost workspace" auto-delete; latest-vs-latest diffs).

2. Adapt

2.1 Host-service boundaries, scaled down

Superset's createApp({ config, providers }) pattern is overkill for v0 Swisscheese, but copy the discipline: no GitHub-y / org-y / CI-y assumptions baked in.

Providers for:

LLMCredentialProvider (Anthropic / OpenAI / vendor-OAuth)
RepoFetchProvider (GitHub / GitLab / local)
FindingSink (PR comment / Slack / ticket / dashboard)
PolicyProvider (which reviewers run; severity thresholds)

This is what lets you run the same review engine in CI, in a webhook handler, and on a laptop with one config-file change.

2.2 PATH rewriting + hook injection

If you let users plug in their own reviewer CLIs (claude review, codex review, etc.):

Prepend ~/.swisscheese/bin to PATH
Drop shim scripts there that wrap the real binary
Idempotent, marker-fenced rewrite of the agent's own hook config

This is how Superset instruments unmodified third-party CLIs. (apps/desktop/src/main/lib/agent-setup/.) For Swisscheese the bigger ask is probably "use the AI SDK directly" rather than wrapping CLIs — but the pattern is right when you need it.

2.3 Worktree isolation for reviewers that execute code

Most review tools just read a diff, but the high-value reviewers want to run the code (build, test, dynamic analysis, fuzzing). Each such reviewer in its own worktree means parallel runs without stomping.

Don't reach for Docker first — git worktrees + a .swisscheese/setup.sh per repo gets you 80% of the isolation at 5% of the operational cost. Reach for containers only when you need security isolation between reviewers (e.g., untrusted PR content).

2.4 tRPC ↔ MCP shared handlers

Define your review primitives as tRPC procedures, then expose them as MCP tools via thin shims:

defineTool(server, {
  name: "finding_create",
  inputSchema: { … },
  handler: async (input, ctx) => caller.finding.create(input),
});

(Template: packages/mcp-v2/src/tools/automations/create.ts.)

Primitives to consider:

finding.create / finding.dismiss / finding.update
review.complete / review.escalate
revision.request
comment.thread

Reviewer agents calling these should be indistinguishable from human comments except in author.kind. Same vocabulary for humans + agents.

2.5 Disconnect signals over silent failure

Steal disconnectedAt / disconnectReason (integration_connections schema). When a reviewer agent's API key 401s, surface "Reviewer X stopped — token expired" in the UI, not "review hangs forever."

Apply to: LLM credentials, GitHub app installs, third-party MCP servers, external rule engines.

3. Skip / keep simple

Five-process Electron architecture, manifest-based adoption. Only worth it if you have a desktop app with sessions that should outlive crashes. For a review service, probably one process or a worker pool.
PTY daemon, node-pty, Bun-incompatibility complexity. Skip unless reviewers shell out to interactive tools.
Per-organization host-service forking. Overkill until you have actual multi-tenancy beyond logical scopes.
Mastracode harness (@mastra/core). It's an in-app coding agent runtime — you probably want a much thinner thing. Vercel AI SDK + your own state machine is enough for ensemble review. Mastra's hooks model is worth studying as a design, not adopting whole.
Electric SQL row-sync. Overkill unless you have a desktop client. Postgres + LISTEN/NOTIFY or Redis streams will outperform it for a server-side review pipeline.

4. Pitfalls Superset hit — bake in avoidance from day one

Pitfall	Avoidance
Stale cache deletes real data	Findings are append-only; only humans + finding-author can dismiss
Hardcoded `origin/main`	PRs from forks have a different upstream. Use the PR's actual `base.repo.full_name + base.ref`
Latest-vs-latest comparison	Use the merge-base for diffs (Flavor 2 in `V2_WORKSPACE_DIFF_VIEWS.md`)
Polling races	Event-source from day one
OAuth refresh discovered late	Bake refresh-token + advisory-lock-single-flight + 401-retry into the integration layer up front
Forking upstream agent CLIs	Whatever you wrap, wrap with hooks/PATH/config — never with patches
Two reviewers writing to the same finding	Single-writer discipline (event-sourced log + per-finding ownership)
Reviewer marks something complete that the human disagrees	Authority table — humans can always override; agents can never override humans

5. One opinionated recommendation

The single most leveraged idea to copy is structured launch context with system-scoped cacheable sections (§1.2).

For an ensemble reviewer, the same ~50KB system prompt + style guide + repo conventions goes to every agent on every PR. Anthropic's prompt cache turns that from "expensive" to "negligible" — but only if you tag it correctly.

Building the rest of Swisscheese around ContextSection[] from day one gives you the right vocabulary for everything else: per-reviewer dialects (§1.1), event payload shapes (§1.3), MCP tool inputs (§2.4), and human-vs-agent provenance (§2.4).

If you skip this and ship flat strings first, retrofitting it is the most painful refactor I can think of for this kind of system.

6. Suggested v0 architecture (2-week sprint)

If I were building Swisscheese v0:

swisscheese/
├── packages/
│   ├── core/                ← review primitives (tRPC + MCP shims)
│   │   ├── src/event-log/   ← append-only review log, pure reducer
│   │   ├── src/findings/    ← finding.create / dismiss / update
│   │   └── src/review/      ← review.complete / escalate
│   ├── orchestrator/        ← fan-out, idempotency, retries
│   │   ├── src/dispatch.ts  ← buildIdempotencyKey, per-reviewer routing
│   │   └── src/aggregator/  ← combines findings, dedup, severity
│   ├── prompts/             ← LaunchSource → ContextSection → LaunchSpec
│   │   ├── templates/       ← per-agent Mustache (claude.xml, codex.md)
│   │   └── cache-hints.ts   ← cacheControl tagging
│   ├── reviewers/           ← built-in reviewer presets
│   │   ├── security/
│   │   ├── perf/
│   │   ├── api-stability/
│   │   └── style/
│   └── adapters/            ← provider injection
│       ├── llm/             ← anthropic, openai, ollama
│       ├── repo/            ← github, gitlab, local
│       └── sink/            ← pr-comment, slack, dashboard
├── apps/
│   ├── ci-runner/           ← `swisscheese review` CLI
│   ├── webhook/             ← GitHub App webhook receiver
│   └── dashboard/           ← optional UI (event-log replay)
└── plans/
    └── 20260503-event-log.md  ← write the boundaries doc first

Boundaries doc first, code second. (Superset's principal lesson.)

7. Reading order from this analysis

If you're building Swisscheese, these are the Superset files most worth your time:

apps/desktop/HOST_SERVICE_BOUNDARIES.md — how to write a contract that resists feature creep
plans/v2-chat-greenfield-architecture.md — the event-sourced log playbook in detail
plans/done/v2-workspace-context-composition.md — LaunchSource[] → LaunchSpec design
packages/shared/src/agent-prompt-template.ts — concrete prompt templating
packages/shared/src/agent-prompt-launch.ts — the heredoc utility
packages/mcp-v2/src/tools/automations/create.ts — tRPC ↔ MCP shim pattern
packages/chat/src/server/trpc/utils/runtime/runtime.ts — hook lifecycle (UserPromptSubmit, Stop, …)
apps/desktop/V2_WORKSPACE_DIFF_VIEWS.md — fork-point diff correctness (the bug class to avoid)
apps/api/MCP_TOOLS.md — public MCP tool spec format