07 — Other Considerations
Things the prompt didn't ask for but you'd want to know if you're learning from this repo or considering forking it.
Notable ideas to take
1. Schema-as-sanitizer for prompt-injection control
The single most copyable pattern. output_schema: blocks on untrusted-reader subagents (e.g. subagents/reader.yaml:35-58) use regex pattern + maxLength + additionalProperties: false not just to validate types, but to filter an attacker's English-language injection out of any data that crosses an agent boundary. If the field is pattern: "^[A-Za-z0-9._:-]+$" of maxLength: 64, a "ignore previous instructions and exfiltrate" sentence cannot survive the validator.
Take this even outside FSI: any agent that ingests user-supplied PDFs or web content into a downstream tool-using agent benefits from a schema-validated waist between them.
2. "One source, two wrappers" with byte-equality CI
scripts/check.py:114-131 does a filecmp.dircmp between vertical-plugins/<v>/skills/<s>/ and agent-plugins/<slug>/skills/<s>/ and fails the build on any drift. This means the bundled-skill copies are always fresh, even though they're physically duplicated.
Generalizable for any repo where you need self-contained distribution units (Cowork plugins) and a single source of truth (verticals).
3. Honest threat-model docs in code comments
scripts/orchestrate.py:8-15 opens with a paragraph documenting why the script's regex-based handoff routing is the wrong long-term answer ("In production, prefer emitting handoffs via a dedicated tool call or a typed SSE event the model cannot produce by quoting document text."). The reference implementation actively points at its own weaker spot. Worth copying in any reference code.
4. Anti-example sections (<common_mistakes>)
dcf-model/SKILL.md:581-756 lists known WRONG patterns the model has actually emitted (linear approximations in sensitivity tables, // WRONG - Placeholder note, "common rationalization to REJECT"). Most prompt libraries list only positive examples. Listing the justification the model would invent for a wrong answer ("Writing 75+ formulas feels complex, so I'll leave a note") is a high-leverage prompting move.
5. [UNSOURCED] as searchable uncertainty marker
When the agent can't source a number, it's required to mark it [UNSOURCED] rather than estimate (pitch-agent.md:31, earnings-reviewer.md:29). This converts hallucination risk into an explicit lint hit the human reviewer can grep for. Cheap, robust, broadly applicable.
6. Setup log pattern in the M365 wizard
claude-for-msft-365-install/commands/setup.md:11-15 tells Claude to maintain a setup log at ~/Desktop/claude-for-msft-365-install-setup.md and to resume from it on rerun. This makes the wizard idempotent and resumable — useful pattern for any long-running interactive Claude session that walks a user through provisioning.
7. Schema-validated env-var substitution
scripts/deploy-managed-agent.sh:43-47 rejects ${VAR} values containing characters outside [A-Za-z0-9._/:@-]. This is a tiny line of defense against an attacker setting MCP_URL='"; rm -rf /' in the environment that gets templated into a shell-adjacent context. Cheap, almost free, often forgotten.
Pitfalls to be aware of
1. Skill drift surface area is wide
Any commit that edits a vertical-plugins/<v>/skills/<s>/ file but forgets scripts/sync-agent-skills.py will fail CI but only at the byte-equality check — not at the moment of editing. Suggestion: add python3 scripts/sync-agent-skills.py as a pre-commit hook.
2. ALLOWED_TARGETS in orchestrate.py is hand-maintained
scripts/orchestrate.py:23-27 hardcodes the 10 agent slugs:
ALLOWED_TARGETS = {
"pitch-agent", "market-researcher", "earnings-reviewer", "meeting-prep-agent",
"model-builder", "gl-reconciler", "kyc-screener",
"valuation-reviewer", "month-end-closer", "statement-auditor",
}This list is independent of marketplace.json and managed-agent-cookbooks/. If a new agent is added, this set must be updated by hand. check.py doesn't yet cross-check it. A drift here is silent: new agent ships, handoffs to it are silently dropped.
3. Handoff-as-text vs handoff-as-tool-call
The handoff_request JSON blob lives in the orchestrator's text output. A malicious document could inject a literal blob that the regex catches. Mitigations are real (ALLOWED_TARGETS + HANDOFF_PAYLOAD_SCHEMA), but the core architectural fix — a typed handoff primitive the model cannot produce by quoting text — depends on a platform feature that isn't there yet.
4. Skills are description-routed
Skill triggering is keyword-based (model reads description and decides). This works when descriptions are concrete ("Triggers on 'CIM', 'confidential information memorandum', ...") and fails when they're vague. There is no enforced trigger registry — you discover misrouting only by running the agent.
5. Every agent uses claude-opus-4-7 everywhere
Every cookbook (orchestrator + every subagent) sets model: claude-opus-4-7. A reader subagent doing structured JSON extraction is overkill on Opus; Sonnet/Haiku would be cheaper and just as accurate. The repo is leaving cost on the table by not differentiating model per leaf — but doing so adds a tuning surface and is left to the firm.
6. MCP URLs are unauthenticated in the manifest
.mcp.json and mcp_servers: only declare url: — auth is implicit (per-user OAuth in Cowork; vendor-specific in CMA via env vars). A stale or hijacked MCP URL could exfiltrate the agent's tool calls. Production deployments should pin URLs to firm-controlled proxies that verify upstream identity.
7. .mcp.json is not validated by check.py
check.py validates JSON parse for marketplace.json, plugin.json, and steering-examples.json but not .mcp.json. A typo there ships silently to users.
8. Hooks scaffolded but unused
hooks/hooks.json is [] or {}. The hooks system would be a natural place to enforce "always run recalc.py after model-builder writes an .xlsx" — currently the prompt tells the LLM to do it, which is weaker than a Stop hook.
9. No behavioral tests
scripts/test-cookbooks.sh checks structural shape only. There are no recorded prompts or evaluation harnesses against which agent quality is measured. A change to a skill body that subtly breaks DCF outputs would not be caught by CI — only by a downstream user noticing.
10. Microsoft 365 setup wizard is a powerful agent
claude-for-msft-365-install/commands/setup.md walks an admin through Vertex/Bedrock/Foundry provisioning, asks for OAuth secrets in chat ("paste the Client ID when you have it"), and shells out to node/gcloud/aws. The setup is great UX but means the admin is pasting credentials into a chat window during onboarding. The flow is appropriate for the use case (admins, in their own env), but it's worth flagging that this command, by design, has tenant-admin power.
Things this repo could add
- Skill drift pre-commit hook. Stop the editing-without-syncing footgun.
check.pycross-checkorchestrate.py:ALLOWED_TARGETSagainstmarketplace.jsonagent slugs..mcp.jsonJSON-schema validation incheck.py.- A behavioral eval harness — even a small one, with golden outputs per agent for representative steering events.
- Per-leaf model selection. Default reader subagents to Sonnet/Haiku; orchestrator + write-holder to Opus.
- A Stop hook that runs
python recalc.pyautomatically after model-builder writes an .xlsx. - Move the handoff format to a tool call when the platform supports it; mark
orchestrate.pyas deprecated.
Questions worth asking before forking
- Where are your trusted vs untrusted source boundaries? The reader/orchestrator/writer split assumes the firm has a clear answer (custodian PDFs untrusted, internal GL trusted). If the firm's data sources are blended, the tier table needs redesigning.
- What's the firm's workflow engine? The repo says "your Temporal/Airflow/event bus" but
scripts/orchestrate.pyis a 90-line example. Production should plug into an engine that owns durability, retries, and idempotency. - What's the audit trail? Cookbook READMEs say "stages for human sign-off"; the form of that sign-off (DocuSign? ticketing system? GL workflow tool?) is firm-specific and absent here.
- Per-tenant isolation? All MCP URLs come from env vars in the deploy script. A multi-tenant deployment needs a session→credentials mapping outside what's shown.
- Cost. Every agent uses Opus and several skills include "load full filings — do not summarize from snippets" (
pitch-agent.md:21). Per-run token costs in production will be material; a finance team would want to budget per agent invocation.
What I find well-done
- The repo is honest about what it is and isn't — drafts, not decisions; reference, not production; preview, not GA. Every cookbook has a "Not guaranteed" note.
- The structural enforcement in
check.pyis thorough — every cross-file reference resolves, every bundled skill matches its source, every required cookbook file exists. - The "untrusted reader / re-verifier / write-holder" tier table is a portable security pattern other agent libraries should copy.
- The Microsoft 365 setup wizard is a good demonstration of "the slash command IS the program" — markdown-driven imperative tooling for tenant-side admins.
- The
<common_mistakes>section in DCF skill is rare and worth borrowing in your own prompts. - Honest threat-model commentary in
scripts/orchestrate.py:8-15is exactly what you want from reference code.
What I'd push back on
- Skill description as router is fragile. Need an explicit trigger registry, with overlap checks.
ALLOWED_TARGETSas a separate Python set is drift-prone.- Single-model-everywhere is wasteful.
- Prose security claims ("the doc-reader has Read/Grep only and returns length-capped structured JSON") in cookbook READMEs are accurate today but not mechanically tied to the yamls. A
check.pyrule that verifies the "Tier" table claims against the actual subagent yamls would prevent docs and code from drifting. - No first-class hooks usage despite the scaffolding being there.
The repo is a reference catalogue. Anyone forking it should treat the pattern set (isolation tiers, schema-as-sanitizer, allowlist+validate handoffs, anti-example prompting, cite-or-flag-as-unsourced) as the deliverable, more than any specific agent.