Takeaways
What's worth stealing from this codebase, and what to be careful of.
1. Ideas worth stealing
The composable-prompt stack with explicit precedence
Most LLM apps either (a) build prompts ad-hoc per route or (b) have a "system prompt" config field. OD has a layered prompt stack with documented order-of-precedence rules. The composer in apps/daemon/src/prompts/system.ts:109-191 is a single pure function, the order is hardcoded in code rather than configured at runtime, and comments justify which layer wins on conflict:
"Discovery + philosophy goes FIRST so its hard rules ('emit a form on turn 1', 'branch on brand on turn 2', 'TodoWrite on turn 3', run checklist + critique before
<artifact>) win precedence over softer wording later in the official base prompt."
This is reviewable in a way that "tweak the prompt and pray" isn't. If you build LLM apps with multiple stakeholders, this is a model worth copying.
<question-form> as a structured-input primitive
The agent emits an XML-like custom element with a JSON body; the web UI parses and renders it as a real form. No tool-use plumbing required on the agent side. Any LLM that can emit XML-flavoured text gets the structured-input UX for free, across all 12 supported CLIs.
This trick is worth borrowing whenever you want to elicit structured input from a chat agent that cannot do native tool use, or when you're building a UI on top of multiple agent vendors with different tool-use schemas.
Direction picker — converting "model freestyle" into "1 of 5 deterministic packages"
prompts/directions.ts defines 5 visual directions with verbatim OKLch palettes, font stacks, and posture cues. The user picks via radio button; the agent binds the chosen direction's :root block verbatim. This pattern — replace creative model decisions with curated picks — is generalizable to any domain where you want consistency over expressivity.
The clever extra step: the same library powers the user-facing form (renderDirectionFormBody()) and the agent's spec block (renderDirectionSpecBlock()). One source of truth, two output formats, both compiled at request time.
Linter feedback loop
POST /api/artifacts/save runs lint-artifact.ts; findings are formatted by renderFindingsForAgent() and fed back to the agent on the next turn as a system message. Style enforcement closes the loop without yelling at the user. You enforce the rules where they matter (the artifact) and you teach the agent in the language it understands (a system message in the next turn).
This is hard to do well in dialog-only UIs. OD nails it because the artifact is a real file the daemon can scan after every save.
Plain files for artifacts; SQLite for metadata
Artifacts are reviewable in PRs and greppable. Metadata is queryable. Each format gets the storage that suits it. The split costs more code in projects.ts (path traversal, sanitization) but the payoff is git-friendliness, which compounds over the project's lifetime.
Sidecar process stamps
The 5-field stamp pattern (app, mode, namespace, ipc, source) is a small piece of infrastructure (packages/sidecar-proto, packages/platform) that pays for itself in two ways:
tools-dev statusfinds processes by stamp; no lockfile, no PID file.namespacelets concurrent local sessions coexist; you can run two complete OD setups on one machine with--namespace=aliceand--namespace=boband nothing collides.
If you build local-first dev tooling that spawns multiple processes, the stamp pattern is a clean way to handle isolation + discovery in one move.
Media generation as agent-side shell
Instead of writing 12 per-CLI tool implementations for image/video/audio, OD ships an od media generate subcommand and instructs the agent (via prompts/media-contract.ts) to dispatch via shell. The daemon injects OD_DAEMON_URL and OD_PROJECT_ID into the spawn env; the subcommand phones home over loopback.
This is the "uniform integration across N tools" pattern. It works because every code-agent CLI has Bash. If you're integrating with multiple agent tools whose APIs differ but whose shell-out semantics agree, this is the cleanest unification I've seen.
Pre-flight directive to fight context truncation
prompts/system.ts:388-397 detects whether the skill body references seed files and injects a redundant "Read X, Y, Z first" directive at the top of the skill block. The skill body itself already says this, but the team noticed that under context pressure agents sometimes skip Step 0. Belt-and-braces against truncation, in code, with a comment explaining why.
If you're hitting reliability issues where agents drop early-prompt instructions under load, consider structurally-hoisting the most critical directives.
Argument clamping for billing safety
media.ts:158-190 clampNumber() and agents.ts:78-95 clampCodexReasoning() are small functions with outsized impact. If a model hallucinates --length 9999 or sends an invalid reasoning effort, clamping snaps it to a safe value rather than letting an upstream API charge for a month-long render or fail unpredictably.
Worth doing wherever agent-emitted CLI args hit a paid provider.
Triple-backtick zero-width-space escape
prompts/system.ts:333 — when injecting user-pasted prompt-template content into a markdown fence, replace ``` with ` (zero-width-space-separated). Defeats fence-break injection while keeping the visible text close to the original. One line, real defense.
Anti-corruption layer via packages/contracts
The contracts package has explicit no-import rules (no Next, Express, fs, browser, sqlite, sidecar-control-plane). It's pure TypeScript DTOs shared between web and daemon. That single discipline is what keeps the polyglot stack consistent. Worth replicating in any project where two runtimes must agree on wire format.
2. Pitfalls / things to be careful of
Auto-approve permissions everywhere
Every spawned CLI runs with auto-approve flags (--permission-mode bypassPermissions, --full-auto, --dangerously-skip-permissions, --allow-all-tools). The cwd is the only sandbox. A malicious skill could shell out via the agent's Bash tool.
The team is upfront about this (docs/architecture.md:316-322, agents.ts:60-65) — they inherit the agent's permission model and accept the cwd-only perimeter. If you fork OD for a multi-tenant or untrusted-skill scenario, you'd need to add a real sandbox (container, gVisor, restricted user).
// @ts-nocheck on hot daemon files
apps/daemon/src/server.ts:1 and agents.ts:1 opt out of strict type checking. The risk is type drift inside the daemon. Mitigated by the typed packages/contracts boundary, but if you extend the daemon, expect to be more careful with types than the rest of the repo demands.
Stream-format pinning
Each parser (claude-stream.ts, acp.ts, etc.) knows its upstream JSON schema. When Claude Code or Codex updates their stream format, parsers can break silently mid-line. The capability probe (agentCapabilities Map) helps for known flags but not for schema changes. Plan to monitor upstream CLI release notes if you depend on this.
In-memory chat-runs map
runs.ts is in-memory. If 50 browser tabs hit /api/chat simultaneously, 50 child processes spawn. There's no run quota or per-user limit. Not a problem for single-user MVP; meaningful at any kind of scale.
Single SQLite file with code-driven migrations
db.ts does forward-compatible ALTERs in openDatabase(). No migration tool. Schema rollbacks are manual. Fine for now; will require triage when the schema changes meaningfully.
Skill scanning is on-demand and unwatched
/api/skills re-scans the directory every request. No watcher in production (docs/architecture.md:329 mentions chokidar in dev). For 50 skills + 130 design systems on a fast disk this is fine; it's a future scaling cliff if the catalog grows.
Prompt-stack size
The full system prompt can run to thousands of tokens before the user message. The team trades context for determinism; on small-context agents (Gemini Nano-class, future small-Claude variants) this could blow the budget. There's pruning (od.design_system.sections, 12 KB template caps, 4 KB prompt-template caps) but no automatic truncation guard for the whole stack.
Single-page web app
Next.js 16 with one catch-all dynamic route. Deep links work via router.ts reading window.location, but SSR is essentially unused for routing — the framework choice is mostly about deploy story, not request handling. If you fork it for a multi-page app, the one-page assumption is baked in pretty deep.
Skills run with full agent privileges
Per docs/architecture.md:319 and docs/spec.md:140, skills "shell out via the agent" and skill trust is a deferred concern. od skill add <git-url> clones into ~/.open-design/skills/. There's no signing, no sandbox, no review. Treat skills like npm packages — install only from sources you trust.
The // @ts-nocheck daemon also runs JSON.parse on stream lines
The stream parsers catch JSON parse errors per-line, but a malformed multi-byte split between chunks could occasionally produce dropped events. The implementations buffer correctly in normal cases; under flaky stdin/stdout pipes (Windows pipe weirdness, terminated children, etc.) you may see edge-case event loss.
3. What's missing or in-flight
These are explicit non-goals or planned-but-not-shipped, mostly per docs/spec.md and docs/architecture.md:
- No daemon ↔ Vercel reverse-tunnel helper. Topology B requires the user to run
cloudflaredor similar themselves. - No skill marketplace.
od skill add <git-url>is the install model; no browseable UI. - No comment-mode for weak agents. Gracefully degrades to "regenerate the whole file with constraint X."
- No multi-user / RBAC / orgs. Single-user single-machine MVP.
- No collaborative editing.
- No mobile-web support for the OD UI itself. Desktop only.
- No offline mode beyond "the agent is local."
- No PPTX export from non-
guizang-pptdecks. Falls back to "print to PDF then page-to-slide PPTX."
4. Things I'd try if I were extending it
- Native skill watcher in production with
chokidar. The architecture doc plans it; implementation is missing. - Skill signing. Record git SHA at install; warn on signature change; let users pin a revision.
- Per-user run quota. Trivial to add to
runs.ts; could useclientRequestIdalready passed in. - Run resumption by stream replay.
messages.events_jsonalready persists every event; the missing piece is wiring browser reconnect to "show what landed before, then attach SSE for the rest." - First-party reverse-tunnel for Topology B (could ship as
od daemon --exposeand bundle a thin tunnel client). - Skill capability declarations as runtime gates.
od.capabilities_required: [surgical_edit]is in the protocol but I didn't find runtime enforcement that hides such skills when the active agent lacks the capability. - Token budgeter on
composeSystemPromptthat enforces an upper bound by progressively dropping the oldest message context, then inspirationDesignSystemIds, then template body, then craft refs — in that order — before truncating.
5. Summary
OD is one of the better examples of "LLM application architecture done with discipline" I've read. The bet — spawn the user's CLI rather than reinvent the agent loop — is enabled by careful prompt engineering, an opinionated contract design, and a stamp-based process model that handles the messiness of multi-process local-first dev tooling. The prompt stack is reviewable in code, the linter closes the feedback loop, and the contract boundary in packages/contracts keeps web and daemon honest.
The cost is real: 12 stream parsers, auto-approve permissions, an in-memory run service, and a system prompt that runs to several thousand tokens before the user message. But the payoff is a small, focused integration shell that does one thing well — turn a brief into a real, editable, design artifact — across whichever code-agent CLI happens to be on the user's PATH.
If I were to summarize the engineering spirit of the codebase in one sentence: "Agents will hallucinate; design rules are checkable; therefore the rules go in code, the agent gets coached on every turn, and creativity is reserved for the parts of the brief the user actually asked us to be creative about."