CodeDocs Vault

Swisscheese: Differentiators for a Multi-Agent AI Code Review Platform

Context: Swisscheese is a platform for scaling up agents (Claude Code, Codex, etc.) to review AI-generated code. Named after the Swiss cheese model from safety engineering -- stack imperfect layers so the holes don't align.

The Core Problem

AI-generated code volume is exploding. Human reviewers are the bottleneck. But AI reviewing AI has a fundamental risk: correlated failures -- the same blind spots that caused the bug may also cause the reviewer to miss it.

Swisscheese needs to make multi-agent review more trustworthy than any single reviewer, human or AI.

Six High-Impact Differentiators

1. Adversarial Multi-Model Review (the actual Swiss cheese)

This is the namesake and should be the core moat.

The insight: Claude reviewing Claude's code has correlated blind spots. GPT reviewing GPT's code has the same problem. But Claude reviewing GPT's code (or vice versa) has uncorrelated failure modes -- different training data, different reasoning patterns, different biases.

What to build:

Multica's Backend interface pattern (server/pkg/agent/agent.go:15-21) is directly reusable -- abstract 10 providers behind one Review(diff, context) -> []Finding contract.

2. Disagreement-as-Signal

Most review tools show N independent reviews. That's just noise multiplication. The real value is in disagreement detection.

What to build:

Why this is a differentiator: No existing tool does this. GitHub Copilot review gives one opinion. Running 3 agents gives 3 opinions. Swisscheese would give a synthesized confidence assessment.

The pitch: "Swisscheese doesn't give you more reviews. It tells you where the holes are."

3. Review-Specific Context Window

Multica gives agents full codebase context. Review needs a different, more structured context:

┌─ Review Context ─────────────────────────────────────┐
│                                                       │
│  1. The Diff (what changed)                           │
│  2. The Intent (PR description, linked issue, spec)   │
│  3. The Blast Radius (what depends on changed files)  │
│  4. The History (recent changes to same files,        │
│     past review comments, known fragile areas)        │
│  5. The Rules (coding standards, security policies,   │
│     team-specific patterns)                           │
│  6. The Test Coverage (what's tested, what isn't)     │
│                                                       │
└───────────────────────────────────────────────────────┘

Most agents just see the diff. Swisscheese should assemble this full context automatically -- that's what makes agent reviews approach human reviewer quality.

4. Structured Finding Taxonomy (Not Free-Text Comments)

Agents produce noisy, verbose review comments. Force structure:

Finding:
  severity:    critical | warning | nit | question
  category:    security | correctness | performance | style | test-coverage
  location:    file:line_start-line_end
  claim:       "This SQL query is vulnerable to injection"
  evidence:    "User input flows from line 42 to line 67 without sanitization"
  suggestion:  "<concrete code fix>"
  confidence:  0.0-1.0

Why this matters:

5. Feedback Loop: Review Quality Scoring (long-term moat)

This is where long-term compounding value is built.

Track per finding:

Use this to:

No existing tool closes this loop. It's expensive to build but creates compounding value.

6. Incremental Re-Review

When a developer pushes a fix in response to a review comment, most tools re-review the entire PR. Waste of tokens and time.

What to build:

Operationally critical at scale -- if you're reviewing hundreds of PRs/day, re-reviewing everything is prohibitively expensive.

What to Borrow from Multica

Multica Pattern Reuse? Adaptation for Swisscheese
Backend interface for 10 providers (agent.go:15-21) Yes Reviewer interface with Review(diff, context) -> []Finding
Token usage tracking (per-model, per-task) Yes Critical for cost management at review scale
Agent-as-subprocess (claude.go:22-212) Yes Same approach -- spawn agent CLIs, parse structured output
WS + cache invalidation for real-time (realtime/, use-realtime-sync.ts) Yes Stream review progress live
Prompt injection via CLAUDE.md (runtime_config.go:41-242) Adapt Inject review-specific rules and context instead of task context
PII/secret redaction (redact/redact.go) Yes Code diffs may contain secrets too

What Multica Gets Wrong for Review (Fix in Swisscheese)

Multica Gap Risk for Review Swisscheese Fix
No quality assessment of agent output Review output quality IS the product Structured findings + confidence scores + feedback loop
No spending limits (tracked but not capped) Runaway costs at review scale Per-PR and per-org token budgets
Single opinion per task Misses correlated failures Multi-opinion + synthesis + disagreement scoring
Conversational loop prevention Not applicable Reviews are structured, not conversational -- different problem
Synchronous event bus Review is batch-oriented Async queue may fit better for fan-out to N reviewers
Agent instructions as free text Hard to measure review quality Structured review prompts with explicit taxonomy

Architecture Sketch

        GitHub/GitLab Webhook (PR opened/updated)
                      │
                      ▼
            ┌─── Swisscheese Server ───┐
            │                          │
            │  Context Assembler       │
            │  (diff + intent +        │
            │   blast radius +         │
            │   history + rules)       │
            │         │                │
            │         ▼                │
            │  Review Orchestrator     │
            │  ┌──────┼──────┐        │
            │  ▼      ▼      ▼        │
            │ Agent1 Agent2 Agent3    │  ← different models/prompts
            │ (sec)  (logic) (perf)   │
            │  │      │      │        │
            │  ▼      ▼      ▼        │
            │  Finding Aggregator      │
            │  (dedup, disagree,       │
            │   confidence scoring)    │
            │         │                │
            │         ▼                │
            │  Human Escalation Gate   │
            │  (auto-approve if        │
            │   all agree + high       │
            │   confidence)            │
            │         │                │
            └─────────┼────────────────┘
                      ▼
            Post structured review
            comments back to PR

Prioritization

Priority Differentiator Why
MVP #2 Disagreement-as-signal Unique capability, immediately valuable, validates the core thesis
MVP #1 Cross-model review Enables #2, addresses correlated failures directly
MVP #4 Structured findings Required for #2 to work (can't compare free-text)
V2 #3 Review context assembly Improves quality of each individual review
V2 #6 Incremental re-review Cost optimization, important at scale
Long-term #5 Feedback loop Compounding moat, needs data volume to be useful