Lessons for Swisscheese — applying OD's patterns to a code-review-at-scale system

This file captures architectural lessons OD offers to Swisscheese, a system for scaling up coding agents (Claude Code, Codex, etc.) to review AI-generated code. The two systems share a structural pattern (integration shell around multiple agent CLIs) but operate in different domains, which flips a few priorities.

1. Domain mapping

Axis	Open Design	Swisscheese
Output	Editable design artifacts (HTML, decks, etc.)	Findings (severity-tagged, file-line-cited)
"User"	A human typing a brief in a chat UI	Often CI / batch jobs / a PR webhook; not always a human in real time
Failure mode that hurts most	Generic / off-brand output (style failure)	Hallucinated findings / missed real bugs (truth failure)
Scale shape	Single-user single-machine MVP	Worker pools, parallel review, queue-backed
What "creativity" means	A genuine product feature (the agent should produce distinctive design)	A liability (the agent should not invent findings)
Truth-finding	Subordinate to taste	The whole point

The shape of the integration is similar. The shape of correctness is inverted. Lessons should be filtered through that.

2. Directly transferable patterns

2.1 Adapter pattern for multi-CLI support

OD source: apps/daemon/src/agents.ts:115-751 — AGENT_DEFS is a flat array of duck-typed objects, each with id, bin, streamFormat, buildArgs(...), listModels?, reasoningOptions?. New CLIs register by appending.

For Swisscheese: Same shape, different streamFormat. Build per-CLI argv (buildReviewArgs()), probe capabilities at detection (--help parsing → agentCapabilities Map), pin auto-approve flags per CLI so workers don't deadlock on permission prompts. The 12-agent table in OD's agents.ts is the closest pre-existing reference for this work.

2.2 Stream parsers as a strategy registry

OD source: Five parsers (claude-stream.ts, copilot-stream.ts, acp.ts, pi-rpc.ts, json-event-stream.ts) all funnel into a unified event union (status / text_delta / thinking_delta / tool_use / tool_result / usage / done).

For Swisscheese: Define a unified review event union (review_started, file_read, finding_emitted, finding_revised, severity_assigned, usage, done). Parse each CLI's native stream into that union once, at the worker boundary. Downstream code never branches on agent type.

2.3 Sidecar process stamps + namespaces

OD source: packages/sidecar-proto/src/index.ts defines a 5-field stamp (app, mode, namespace, ipc, source). packages/platform writes/reads it as --od-stamp-* argv. tools-dev status discovers live processes by matching stamp criteria — no PID file, no port lockfile.

For Swisscheese: This is gold for "scaling up." Workers stamped with {worker_pool, environment, namespace, ipc_socket, orchestrator} give you:

Concurrency safety — namespace isolates each review run's IPC socket and tmp dir. Two reviews on the same host don't collide.
Discovery — orchestrator finds live workers by stamp criteria, not by tracking PIDs.
Forensics — ps ax | grep --od-stamp-namespace=alice-pr-1234 shows you exactly which workers belonged to which review.

Lift packages/sidecar-proto and packages/platform directly; they're written generically (the OD-specific app keys are in sidecar-proto's descriptor, not in the runtime).

2.4 Composer pattern with hardcoded layer order

OD source: apps/daemon/src/prompts/system.ts:109-191 — composeSystemPrompt() is a pure function. Layer order is hardcoded with comments justifying which layer wins on conflict ("discovery directives go FIRST so 'emit form on turn 1' beats softer 'skip questions for tweaks' wording later").

For Swisscheese: Treat the review prompt the same way. Suggested layer order:

1. Review charter (don't fabricate, cite real lines, severity ladder semantics)
2. Severity ladder (P0/P1/P2 definitions specific to this review profile)
3. Repo CONVENTIONS.md (or absent — this codebase has no overrides)
4. Active rule pack (security / perf / correctness / style / a11y)
5. Per-file or per-directory context (CODEOWNERS, recent bug history)
6. Diff under review (the actual content, last so it doesn't push earlier
   layers out under context pressure)

Hardcoded in code, comments justify precedence, PRs change the stack. Don't make it a YAML config — config drift on a system prompt is bad.

2.5 Linter feedback loop

OD source: POST /api/artifacts/save runs lint-artifact.ts (980 lines, 9 P0 patterns), formats findings via renderFindingsForAgent(), and feeds them back as a system message on the next turn. The agent self-corrects.

For Swisscheese: The single most important pattern to lift. Code review benefits from this even more than design does, because findings have structured fields you can validate without rendering anything. Build a deterministic post-checker:

Does the cited file exist in the diff?
Does the cited line number fall inside the changed range?
Does the quoted snippet actually appear at that line?
Does the suggested fix parse? (run tree-sitter, language-specific parser, or even just eval/compile)
Does the finding match a known false-positive pattern?

Disagreements feed back as a system message. The agent revises. Without this, the agent's word is the final word — hallucinated findings ship. With this, you have a quality floor independent of the agent's confidence.

2.6 Plain files + SQLite split

OD source: Artifacts as files in .od/projects/<id>/; projects / conversations / messages / preview_comments / deployments as rows in ~/.open-design/app.sqlite.

For Swisscheese: Reports as plain markdown files (PR-attachable, greppable, replayable, git-friendly). Runs / findings / comments / agent metadata as rows. Don't put findings inside an opaque DB blob — they need to be reviewable at the filesystem level, including by a human auditor years later.

2.7 Argument clamping for billing safety

OD source: apps/daemon/src/media.ts:158-190 clampNumber() and apps/daemon/src/agents.ts:78-95 clampCodexReasoning() snap hallucinated CLI args to known buckets.

For Swisscheese: A reviewer hallucinating --max-iterations 9999 or --max-files 9999 is real money. Every numeric arg the agent emits to a paid provider goes through a clamp at the daemon edge. Per-model effort/reasoning levels too — gpt-5.5 rejects minimal; clamp it.

2.8 Capability-driven feature gating

OD source: agentCapabilities Map populated at detection time by --help probing; def.buildArgs() consults it before passing optional flags. UI features (comment mode) gated on capabilities.surgicalEdit.

For Swisscheese: Gate features on what the active agent supports — does it have --output-format json? Does it support sub-agent spawning? Does it offer JSON-schema-constrained output (e.g. for findings)? Don't pretend Gemini can do what Claude Code does. Surface degradations explicitly so the user / PR-author knows which checks ran on which agent.

2.9 Anti-corruption layer via a contracts package

OD source: packages/contracts/ is pure TypeScript with explicit no-import rules (no Next, Express, fs, browser, sqlite, sidecar deps). Web and daemon both import it. Single rule that prevents drift in a polyglot stack.

For Swisscheese: Replicate verbatim. Coordinator + workers + UI must agree on Finding, ReviewRequest, ReviewEvent, Severity. One package, no runtime dependencies, types only. Worth doing on day one — retrofitting it later is painful.

3. Patterns that need adaptation

3.1 Discovery form on turn 1 → scope-locking before the model runs

OD pattern: prompts/discovery.ts:34-71 — agent emits <question-form> on turn 1 to lock brief in <30s of radio clicks before any pixel is written.

Wrong shape for Swisscheese: the input is often a PR, not a chat brief.

Right adaptation: the underlying idea — lock inputs deterministically before the model freestyles — applies sharply. Lock review scope (which files, which severity range, which rule pack) outside the model. Never let the agent decide what to review; tell it. This kills the "agent decided to review X but we needed Y" failure mode.

3.2 Direction picker → review profiles

OD pattern: prompts/directions.ts:53-184 — 5 deterministic visual directions, each with palette + fonts + posture cues. Replaces "model freestyles a visual" with "user picks 1 of 5."

Adaptation: Swisscheese review profiles — security, performance, correctness, idiomatic-style, accessibility, dependency-hygiene. Each profile carries:

Deterministic system-prompt block (charter for this kind of review).
Tool allowlist (e.g. correctness may allow Bash for running tests; idiomatic-style does not).
Severity ladder (what counts as P0 in this profile).
Example-finding catalogue (few-shot anchors for what good output looks like).

Same pattern: replace freestyle decisions with curated picks, deterministic at compose time.

3.3 5-dimensional self-critique → finding-quality scorecard

OD pattern: prompts/discovery.ts:156-166 — agent silently scores itself 1–5 on Philosophy / Hierarchy / Execution / Specificity / Restraint before emitting <artifact>. Any dim < 3/5 triggers a fix pass.

Adaptation translates beautifully: before emitting a finding, score it on:

Citation reality — does the line/file actually exist in the diff?
Fix syntactic validity — does the suggested patch parse?
Severity proportionality — is this really P0, or am I inflating?
False-positive list — does this match a known FP pattern?
Senior-engineer plausibility — would someone with judgment actually flag this in a real review?

Drop or revise anything below threshold. The 5-dim shape is "self-critique with explicit, named dimensions" — generalizes well.

3.4 Skills as filesystem extensions → rule packs as filesystem extensions

OD pattern: ~/.claude/skills/<id>/SKILL.md with frontmatter + workflow body. Forkable, versionable, sharable.

Adaptation: ~/.swisscheese/packs/<id>/PACK.md with workflow + ladder + example findings. Frontmatter declares severity_ladder, applies_to: [languages], tool_allowlist. Teams fork and publish their own packs. Drop a folder, it shows up in the picker.

3.5 DESIGN.md → CONVENTIONS.md

OD pattern: prompts/system.ts:130-134 — active design system's DESIGN.md treated as authoritative for color/typography/spacing.

Adaptation: per-repo CONVENTIONS.md treated as authoritative for deviations from generic best-practice. "This codebase uses snake_case in Python despite PEP-8 because reasons." Reviewer reads it, doesn't flag deviations the codebase explicitly endorses. Reduces a huge class of false positives.

3.6 Pre-flight directive → checklist injection above rule pack

OD pattern: prompts/system.ts:388-397 — detects when skill body references seed files and injects a hard "Pre-flight: Read X, Y, Z first" directive above the skill body. Combats context truncation.

Adaptation: detect when a rule pack references seed files (patterns.md, false-positives.md, severity-ladder.md) and inject a pre-flight directive above the pack body. Otherwise the agent under context pressure skips the false-positive list and reports them anyway.

4. OD mistakes Swisscheese should avoid

4.1 Auto-approve permissions everywhere

What OD does: every CLI launched with --permission-mode bypassPermissions, --full-auto, --dangerously-skip-permissions, --allow-all-tools, --yolo (agents.ts:60-65). The cwd is the only sandbox.

Why this is wrong for Swisscheese: code review of untrusted PRs runs adversary-controlled code on your worker. A malicious PR could include a test file that the agent runs. You need a real sandbox — containers per review, restricted user, code mounted read-only, no network egress except to the agent's API.

OD's bet is "the agent's own permission model is enough." For untrusted-input code review, that's a hard floor on safety you cannot accept. Build the sandbox from day one — retrofitting it after a worker is exploited is a bad week.

4.2 `// @ts-nocheck` on hot daemon files

What OD does: apps/daemon/src/server.ts:1 and agents.ts:1 opt out of strict type checking.

Why this is wrong for Swisscheese: findings carry legal / security weight. A typo in a severity field that silently coerces to undefined could mean a P0 ships as a P3. Strong types end-to-end, including Severity as a discriminated union, Finding as a tagged record, runtime validation at the worker boundary. The contracts package gets you the types; turn on strict, noImplicitAny, noUncheckedIndexedAccess.

4.3 In-memory runs map

What OD does: apps/daemon/src/runs.ts keeps run state in an in-memory Map. Restarting the daemon drops in-flight runs.

Why this is wrong for Swisscheese: "scaling up" implies queue-backed worker pools, multi-host, restart-survivable. Build run state in the DB from the start (status, idempotency key, retries, last event seq). Use Redis/SQS/Postgres LISTEN-NOTIFY for the queue. The in-memory pattern works for OD's single-user MVP; it will not work for batch review at scale.

4.4 Skill scanning on-demand and unwatched

What OD does: /api/skills re-scans skills/ every request. No file watcher in production.

Why this is wrong for Swisscheese: rule packs change frequently (security patterns added in response to incidents). Hot-reload from day one. chokidar watch in dev, FS-event-driven cache invalidation in prod, or accept a 60-second cache TTL — but not "re-stat the directory on every request" forever.

4.5 Stream-format pinning with no defense

What OD does: parsers know upstream JSON schemas; no version pinning, no contract tests. When Claude Code 1.x → 2.x changes its event shape, OD breaks silently mid-line.

Why this is wrong for Swisscheese: silent break = silent miss = unreviewed PR shipped to prod. Mitigations:

Pin agent CLI versions in a manifest (agents.lockfile.json).
Run a contract test in CI per upstream agent: feed a known prompt, assert event shape.
Surface "unknown event type" as a structured warning, not a silent skip.
Require explicit version bump to roll forward to a newer CLI.

5. Net architectural axioms (worth lifting verbatim)

Be an integration shell, not an agent. Spawn the user's CLI; don't reimplement the loop. Twelve stream parsers is a smaller cost than one custom agent loop you maintain forever.
Determinism around the model, creativity within it. Scope locked outside, severity ladder locked outside, output schema locked outside. The model fills in findings; it does not decide what to look at or how to grade what it sees.
Close every loop with a deterministic check. OD's linter loop is the highest-leverage pattern. For review specifically, post-checks are even cheaper because findings have structured fields you can validate without rendering anything.
The system prompt is code, not config. Compose in a function, hardcode the layer order, justify precedence in comments. PRs change the stack. YAML-driven prompt config is brittle and unreviewable.
Process stamps for concurrency. Namespace = isolation primitive. Stamp = discovery primitive. Together they give you "two complete Swisscheese setups on one host" and "find me all workers from this PR's review" without lockfiles.

6. Swisscheese-specific question I'd want answered

The mappings above are sharper if you tell me how Swisscheese stacks reviewers — the "Swiss cheese model" framing implies multiple imperfect layers, but the architectural shape depends on which composition style you're going for:

Composition style	Implication for OD's lessons
Parallel + aggregate (run N reviewers, union findings)	Heavy emphasis on dedupe, cross-citation, agreement-weighted severity. Worker pool with stamp-based discovery is the dominant pattern.
Cascading (each layer reviews findings from the previous)	Heavy emphasis on the linter feedback loop pattern, generalized — each layer is the "deterministic check" for the layer above.
Voting / consensus (N reviewers vote on each finding)	Heavy emphasis on the contracts package — every reviewer must agree on `Finding` shape exactly. Severity inflation by any one model affects the vote.
Specialist routing (security pack runs security reviewer, perf pack runs perf reviewer)	Heavy emphasis on the review-profile pattern (§3.2), capability-gating (§2.8), and rule-pack-as-filesystem (§3.4).

Most likely some mix, in which case all four sets of emphases apply. Tell me how you're planning to compose layers and I'll sharpen the recommendations to that shape.

7. Reading order for the OD source

If you want to internalize OD's patterns before applying them to Swisscheese, read in this order (≈ 2 hours):

apps/daemon/src/agents.ts — the adapter pattern, capability probing, model listing.
apps/daemon/src/prompts/system.ts — composer with hardcoded precedence.
apps/daemon/src/lint-artifact.ts — the feedback loop pattern in concrete form.
apps/daemon/src/runs.ts — chat-run service (and notice what it doesn't do for scale).
packages/sidecar-proto/src/index.ts and packages/platform/src/index.ts — process stamps.
apps/daemon/src/claude-stream.ts — the cleanest stream parser (good template).
packages/contracts/src/index.ts — the anti-corruption boundary.

Skim everything else only as needed.