CodeDocs Vault

Lessons For Swisscheese

What a meta-harness for AI code review should learn from Superset.

Swisscheese is a meta-harness solving the code review problem of AI-generated code by scaling up agents (ensemble review). Superset solves a different but adjacent problem — orchestrating ensemble generation — so most of its lessons transfer with adaptation. This doc separates the lifts, the adapts, and the skips.

Confidence note: this is reasoning by analogy. Where Swisscheese's actual constraints differ from what I'm assuming, the priorities will shift.


1. Lift directly

1.1 Per-agent prompt templates with dialects

Same LaunchSource[] (the diff, the PR description, the failing tests, the previous review threads) renders differently per reviewer model — XML for Claude, markdown for Codex/Cursor. Mustache templates per agent, not branching code.

Adding a new reviewer model = adding a template + a preset row. No code branching on if (model === "claude").

1.2 Structured context with scope: "system" | "user" + cache hints

Critical for Swisscheese specifically: the system prompt and style guide are constant across reviewers; the diff changes per PR; the agent role (security, perf, API stability) varies per fan-out. Tag each section with cache_control so Anthropic's prompt cache hits across the swarm. This is the single biggest cost lever you have.

1.3 Event-sourced review log + pure reducer

plans/v2-chat-greenfield-architecture.md is the playbook. One log per PR review, monotonic seq, applyEvent(state, event) reducer on the client, gap detection on receive, replay endpoints.

Each reviewer agent is just an event producer; conflicting findings, deduplication, and human-override become trivial.

Build this from day one. Superset's V1 polling architecture (4 fps from two independent sources, racing on the client) is a cautionary tale about deferring it. Multi-device convergence and reproducible replay come for free.

1.4 Heredoc with random delimiter for prompt-as-shell-arg

Tiny utility, big payoff. (packages/shared/src/agent-prompt-launch.ts:26-68.) When you fan out to a claude/codex/cursor CLI you don't control, this is how you safely pass an arbitrary diff blob without quote-injection.

claude review "$(cat <<'D7M0KQ'
…arbitrary text with backticks, $vars, quotes…
D7M0KQ
)"

Delimiter derived from the prompt content + a random id so it can't appear inside.

1.5 Idempotency keys on every fan-out

buildIdempotencyKey(request) in the orchestrator. Fan-out + retries = duplicate reviews unless you key the dispatch. A retry of "review PR 1234 with reviewer-security-v2" must not produce two reviews.

1.6 Hook-based guardrails, not output filters

The UserPromptSubmit veto hook (packages/chat/src/server/trpc/utils/runtime/runtime.ts:135-144) is where you put:

Pluggable, replaceable per deployment, never baked into core. Hooks > filters because they compose.

1.7 Authority-by-ownership discipline

Write down — explicitly, in a doc like HOST_SERVICE_BOUNDARIES.md — which component owns each review-state question. Examples:

Question Authority
"Is review complete?" Aggregator + human, never any single agent
"Can a finding be dismissed?" Finding author OR human, never another agent
"What revision are we on?" The repo (commit SHA), not the orchestrator's cache
"Which reviewer is canonical?" Configured per-repo, not voted on per-run

Superset's biggest historical bugs were authority confusions ("ghost workspace" auto-delete; latest-vs-latest diffs).


2. Adapt

2.1 Host-service boundaries, scaled down

Superset's createApp({ config, providers }) pattern is overkill for v0 Swisscheese, but copy the discipline: no GitHub-y / org-y / CI-y assumptions baked in.

Providers for:

This is what lets you run the same review engine in CI, in a webhook handler, and on a laptop with one config-file change.

2.2 PATH rewriting + hook injection

If you let users plug in their own reviewer CLIs (claude review, codex review, etc.):

This is how Superset instruments unmodified third-party CLIs. (apps/desktop/src/main/lib/agent-setup/.) For Swisscheese the bigger ask is probably "use the AI SDK directly" rather than wrapping CLIs — but the pattern is right when you need it.

2.3 Worktree isolation for reviewers that execute code

Most review tools just read a diff, but the high-value reviewers want to run the code (build, test, dynamic analysis, fuzzing). Each such reviewer in its own worktree means parallel runs without stomping.

Don't reach for Docker first — git worktrees + a .swisscheese/setup.sh per repo gets you 80% of the isolation at 5% of the operational cost. Reach for containers only when you need security isolation between reviewers (e.g., untrusted PR content).

2.4 tRPC ↔ MCP shared handlers

Define your review primitives as tRPC procedures, then expose them as MCP tools via thin shims:

defineTool(server, {
  name: "finding_create",
  inputSchema: { … },
  handler: async (input, ctx) => caller.finding.create(input),
});

(Template: packages/mcp-v2/src/tools/automations/create.ts.)

Primitives to consider:

Reviewer agents calling these should be indistinguishable from human comments except in author.kind. Same vocabulary for humans + agents.

2.5 Disconnect signals over silent failure

Steal disconnectedAt / disconnectReason (integration_connections schema). When a reviewer agent's API key 401s, surface "Reviewer X stopped — token expired" in the UI, not "review hangs forever."

Apply to: LLM credentials, GitHub app installs, third-party MCP servers, external rule engines.


3. Skip / keep simple


4. Pitfalls Superset hit — bake in avoidance from day one

Pitfall Avoidance
Stale cache deletes real data Findings are append-only; only humans + finding-author can dismiss
Hardcoded origin/main PRs from forks have a different upstream. Use the PR's actual base.repo.full_name + base.ref
Latest-vs-latest comparison Use the merge-base for diffs (Flavor 2 in V2_WORKSPACE_DIFF_VIEWS.md)
Polling races Event-source from day one
OAuth refresh discovered late Bake refresh-token + advisory-lock-single-flight + 401-retry into the integration layer up front
Forking upstream agent CLIs Whatever you wrap, wrap with hooks/PATH/config — never with patches
Two reviewers writing to the same finding Single-writer discipline (event-sourced log + per-finding ownership)
Reviewer marks something complete that the human disagrees Authority table — humans can always override; agents can never override humans

5. One opinionated recommendation

The single most leveraged idea to copy is structured launch context with system-scoped cacheable sections (§1.2).

For an ensemble reviewer, the same ~50KB system prompt + style guide + repo conventions goes to every agent on every PR. Anthropic's prompt cache turns that from "expensive" to "negligible" — but only if you tag it correctly.

Building the rest of Swisscheese around ContextSection[] from day one gives you the right vocabulary for everything else: per-reviewer dialects (§1.1), event payload shapes (§1.3), MCP tool inputs (§2.4), and human-vs-agent provenance (§2.4).

If you skip this and ship flat strings first, retrofitting it is the most painful refactor I can think of for this kind of system.


6. Suggested v0 architecture (2-week sprint)

If I were building Swisscheese v0:

swisscheese/
├── packages/
│   ├── core/                ← review primitives (tRPC + MCP shims)
│   │   ├── src/event-log/   ← append-only review log, pure reducer
│   │   ├── src/findings/    ← finding.create / dismiss / update
│   │   └── src/review/      ← review.complete / escalate
│   ├── orchestrator/        ← fan-out, idempotency, retries
│   │   ├── src/dispatch.ts  ← buildIdempotencyKey, per-reviewer routing
│   │   └── src/aggregator/  ← combines findings, dedup, severity
│   ├── prompts/             ← LaunchSource → ContextSection → LaunchSpec
│   │   ├── templates/       ← per-agent Mustache (claude.xml, codex.md)
│   │   └── cache-hints.ts   ← cacheControl tagging
│   ├── reviewers/           ← built-in reviewer presets
│   │   ├── security/
│   │   ├── perf/
│   │   ├── api-stability/
│   │   └── style/
│   └── adapters/            ← provider injection
│       ├── llm/             ← anthropic, openai, ollama
│       ├── repo/            ← github, gitlab, local
│       └── sink/            ← pr-comment, slack, dashboard
├── apps/
│   ├── ci-runner/           ← `swisscheese review` CLI
│   ├── webhook/             ← GitHub App webhook receiver
│   └── dashboard/           ← optional UI (event-log replay)
└── plans/
    └── 20260503-event-log.md  ← write the boundaries doc first

Boundaries doc first, code second. (Superset's principal lesson.)


7. Reading order from this analysis

If you're building Swisscheese, these are the Superset files most worth your time:

  1. apps/desktop/HOST_SERVICE_BOUNDARIES.md — how to write a contract that resists feature creep
  2. plans/v2-chat-greenfield-architecture.md — the event-sourced log playbook in detail
  3. plans/done/v2-workspace-context-composition.mdLaunchSource[]LaunchSpec design
  4. packages/shared/src/agent-prompt-template.ts — concrete prompt templating
  5. packages/shared/src/agent-prompt-launch.ts — the heredoc utility
  6. packages/mcp-v2/src/tools/automations/create.ts — tRPC ↔ MCP shim pattern
  7. packages/chat/src/server/trpc/utils/runtime/runtime.ts — hook lifecycle (UserPromptSubmit, Stop, …)
  8. apps/desktop/V2_WORKSPACE_DIFF_VIEWS.md — fork-point diff correctness (the bug class to avoid)
  9. apps/api/MCP_TOOLS.md — public MCP tool spec format