Lessons For Swisscheese
What a meta-harness for AI code review should learn from Superset.
Swisscheese is a meta-harness solving the code review problem of AI-generated code by scaling up agents (ensemble review). Superset solves a different but adjacent problem — orchestrating ensemble generation — so most of its lessons transfer with adaptation. This doc separates the lifts, the adapts, and the skips.
Confidence note: this is reasoning by analogy. Where Swisscheese's actual constraints differ from what I'm assuming, the priorities will shift.
1. Lift directly
1.1 Per-agent prompt templates with dialects
Same LaunchSource[] (the diff, the PR description, the failing tests, the previous review threads) renders differently per reviewer model — XML for Claude, markdown for Codex/Cursor. Mustache templates per agent, not branching code.
- Source:
packages/shared/src/agent-prompt-template.ts - Plan:
plans/done/v2-workspace-context-composition.md
Adding a new reviewer model = adding a template + a preset row. No code branching on if (model === "claude").
1.2 Structured context with scope: "system" | "user" + cache hints
Critical for Swisscheese specifically: the system prompt and style guide are constant across reviewers; the diff changes per PR; the agent role (security, perf, API stability) varies per fan-out. Tag each section with cache_control so Anthropic's prompt cache hits across the swarm. This is the single biggest cost lever you have.
- Pattern:
ContextSection[]withscope,kind,label,content[],cacheControl,meta - Result: 50KB style-guide that ships with every PR review goes from "expensive" to "negligible" with proper tagging
1.3 Event-sourced review log + pure reducer
plans/v2-chat-greenfield-architecture.md is the playbook. One log per PR review, monotonic seq, applyEvent(state, event) reducer on the client, gap detection on receive, replay endpoints.
Each reviewer agent is just an event producer; conflicting findings, deduplication, and human-override become trivial.
Build this from day one. Superset's V1 polling architecture (4 fps from two independent sources, racing on the client) is a cautionary tale about deferring it. Multi-device convergence and reproducible replay come for free.
1.4 Heredoc with random delimiter for prompt-as-shell-arg
Tiny utility, big payoff. (packages/shared/src/agent-prompt-launch.ts:26-68.) When you fan out to a claude/codex/cursor CLI you don't control, this is how you safely pass an arbitrary diff blob without quote-injection.
claude review "$(cat <<'D7M0KQ'
…arbitrary text with backticks, $vars, quotes…
D7M0KQ
)"Delimiter derived from the prompt content + a random id so it can't appear inside.
1.5 Idempotency keys on every fan-out
buildIdempotencyKey(request) in the orchestrator. Fan-out + retries = duplicate reviews unless you key the dispatch. A retry of "review PR 1234 with reviewer-security-v2" must not produce two reviews.
1.6 Hook-based guardrails, not output filters
The UserPromptSubmit veto hook (packages/chat/src/server/trpc/utils/runtime/runtime.ts:135-144) is where you put:
- Secret-redaction in the diff
- Max-diff-size gate
- "This PR touches X path — escalate to senior reviewer"
- Tenant-specific compliance gates
Pluggable, replaceable per deployment, never baked into core. Hooks > filters because they compose.
1.7 Authority-by-ownership discipline
Write down — explicitly, in a doc like HOST_SERVICE_BOUNDARIES.md — which component owns each review-state question. Examples:
| Question | Authority |
|---|---|
| "Is review complete?" | Aggregator + human, never any single agent |
| "Can a finding be dismissed?" | Finding author OR human, never another agent |
| "What revision are we on?" | The repo (commit SHA), not the orchestrator's cache |
| "Which reviewer is canonical?" | Configured per-repo, not voted on per-run |
Superset's biggest historical bugs were authority confusions ("ghost workspace" auto-delete; latest-vs-latest diffs).
2. Adapt
2.1 Host-service boundaries, scaled down
Superset's createApp({ config, providers }) pattern is overkill for v0 Swisscheese, but copy the discipline: no GitHub-y / org-y / CI-y assumptions baked in.
Providers for:
LLMCredentialProvider(Anthropic / OpenAI / vendor-OAuth)RepoFetchProvider(GitHub / GitLab / local)FindingSink(PR comment / Slack / ticket / dashboard)PolicyProvider(which reviewers run; severity thresholds)
This is what lets you run the same review engine in CI, in a webhook handler, and on a laptop with one config-file change.
2.2 PATH rewriting + hook injection
If you let users plug in their own reviewer CLIs (claude review, codex review, etc.):
- Prepend
~/.swisscheese/binto PATH - Drop shim scripts there that wrap the real binary
- Idempotent, marker-fenced rewrite of the agent's own hook config
This is how Superset instruments unmodified third-party CLIs. (apps/desktop/src/main/lib/agent-setup/.) For Swisscheese the bigger ask is probably "use the AI SDK directly" rather than wrapping CLIs — but the pattern is right when you need it.
2.3 Worktree isolation for reviewers that execute code
Most review tools just read a diff, but the high-value reviewers want to run the code (build, test, dynamic analysis, fuzzing). Each such reviewer in its own worktree means parallel runs without stomping.
Don't reach for Docker first — git worktrees + a .swisscheese/setup.sh per repo gets you 80% of the isolation at 5% of the operational cost. Reach for containers only when you need security isolation between reviewers (e.g., untrusted PR content).
2.4 tRPC ↔ MCP shared handlers
Define your review primitives as tRPC procedures, then expose them as MCP tools via thin shims:
defineTool(server, {
name: "finding_create",
inputSchema: { … },
handler: async (input, ctx) => caller.finding.create(input),
});(Template: packages/mcp-v2/src/tools/automations/create.ts.)
Primitives to consider:
finding.create/finding.dismiss/finding.updatereview.complete/review.escalaterevision.requestcomment.thread
Reviewer agents calling these should be indistinguishable from human comments except in author.kind. Same vocabulary for humans + agents.
2.5 Disconnect signals over silent failure
Steal disconnectedAt / disconnectReason (integration_connections schema). When a reviewer agent's API key 401s, surface "Reviewer X stopped — token expired" in the UI, not "review hangs forever."
Apply to: LLM credentials, GitHub app installs, third-party MCP servers, external rule engines.
3. Skip / keep simple
- Five-process Electron architecture, manifest-based adoption. Only worth it if you have a desktop app with sessions that should outlive crashes. For a review service, probably one process or a worker pool.
- PTY daemon, node-pty, Bun-incompatibility complexity. Skip unless reviewers shell out to interactive tools.
- Per-organization host-service forking. Overkill until you have actual multi-tenancy beyond logical scopes.
- Mastracode harness (
@mastra/core). It's an in-app coding agent runtime — you probably want a much thinner thing. Vercel AI SDK + your own state machine is enough for ensemble review. Mastra's hooks model is worth studying as a design, not adopting whole. - Electric SQL row-sync. Overkill unless you have a desktop client. Postgres + LISTEN/NOTIFY or Redis streams will outperform it for a server-side review pipeline.
4. Pitfalls Superset hit — bake in avoidance from day one
| Pitfall | Avoidance |
|---|---|
| Stale cache deletes real data | Findings are append-only; only humans + finding-author can dismiss |
Hardcoded origin/main |
PRs from forks have a different upstream. Use the PR's actual base.repo.full_name + base.ref |
| Latest-vs-latest comparison | Use the merge-base for diffs (Flavor 2 in V2_WORKSPACE_DIFF_VIEWS.md) |
| Polling races | Event-source from day one |
| OAuth refresh discovered late | Bake refresh-token + advisory-lock-single-flight + 401-retry into the integration layer up front |
| Forking upstream agent CLIs | Whatever you wrap, wrap with hooks/PATH/config — never with patches |
| Two reviewers writing to the same finding | Single-writer discipline (event-sourced log + per-finding ownership) |
| Reviewer marks something complete that the human disagrees | Authority table — humans can always override; agents can never override humans |
5. One opinionated recommendation
The single most leveraged idea to copy is structured launch context with system-scoped cacheable sections (§1.2).
For an ensemble reviewer, the same ~50KB system prompt + style guide + repo conventions goes to every agent on every PR. Anthropic's prompt cache turns that from "expensive" to "negligible" — but only if you tag it correctly.
Building the rest of Swisscheese around ContextSection[] from day one gives you the right vocabulary for everything else: per-reviewer dialects (§1.1), event payload shapes (§1.3), MCP tool inputs (§2.4), and human-vs-agent provenance (§2.4).
If you skip this and ship flat strings first, retrofitting it is the most painful refactor I can think of for this kind of system.
6. Suggested v0 architecture (2-week sprint)
If I were building Swisscheese v0:
swisscheese/
├── packages/
│ ├── core/ ← review primitives (tRPC + MCP shims)
│ │ ├── src/event-log/ ← append-only review log, pure reducer
│ │ ├── src/findings/ ← finding.create / dismiss / update
│ │ └── src/review/ ← review.complete / escalate
│ ├── orchestrator/ ← fan-out, idempotency, retries
│ │ ├── src/dispatch.ts ← buildIdempotencyKey, per-reviewer routing
│ │ └── src/aggregator/ ← combines findings, dedup, severity
│ ├── prompts/ ← LaunchSource → ContextSection → LaunchSpec
│ │ ├── templates/ ← per-agent Mustache (claude.xml, codex.md)
│ │ └── cache-hints.ts ← cacheControl tagging
│ ├── reviewers/ ← built-in reviewer presets
│ │ ├── security/
│ │ ├── perf/
│ │ ├── api-stability/
│ │ └── style/
│ └── adapters/ ← provider injection
│ ├── llm/ ← anthropic, openai, ollama
│ ├── repo/ ← github, gitlab, local
│ └── sink/ ← pr-comment, slack, dashboard
├── apps/
│ ├── ci-runner/ ← `swisscheese review` CLI
│ ├── webhook/ ← GitHub App webhook receiver
│ └── dashboard/ ← optional UI (event-log replay)
└── plans/
└── 20260503-event-log.md ← write the boundaries doc first
Boundaries doc first, code second. (Superset's principal lesson.)
7. Reading order from this analysis
If you're building Swisscheese, these are the Superset files most worth your time:
apps/desktop/HOST_SERVICE_BOUNDARIES.md— how to write a contract that resists feature creepplans/v2-chat-greenfield-architecture.md— the event-sourced log playbook in detailplans/done/v2-workspace-context-composition.md—LaunchSource[]→LaunchSpecdesignpackages/shared/src/agent-prompt-template.ts— concrete prompt templatingpackages/shared/src/agent-prompt-launch.ts— the heredoc utilitypackages/mcp-v2/src/tools/automations/create.ts— tRPC ↔ MCP shim patternpackages/chat/src/server/trpc/utils/runtime/runtime.ts— hook lifecycle (UserPromptSubmit, Stop, …)apps/desktop/V2_WORKSPACE_DIFF_VIEWS.md— fork-point diff correctness (the bug class to avoid)apps/api/MCP_TOOLS.md— public MCP tool spec format