AI & LLM Integration
Overview
Comp AI uses a multi-model AI strategy across 12+ distinct use cases, selecting models based on cost, speed, context window, and capability requirements. AI is not a bolt-on feature - it's core to the platform's value proposition.
Model Selection Matrix
| Use Case | Model | Provider | Temp | Why This Model |
|---|---|---|---|---|
| Policy chat assistant | claude-sonnet-4-6 |
Anthropic | auto | Complex reasoning over long policy documents |
| Policy section editing | claude-sonnet-4-6 |
Anthropic | auto | Precise JSON structure generation |
| Cloud security remediation | claude-opus-4-6 |
Anthropic | 0 | Deterministic IAM policy/CLI generation |
| PDF content extraction | claude-sonnet-4-6 |
Anthropic | auto | Native PDF support (multi-page) |
| Browser automation | claude-sonnet-4-6 |
Anthropic | — | Stagehand visual navigation agent |
| General assistant chat | gpt-5 |
OpenAI | auto | Broad knowledge, tool use |
| Policy generation | gpt-5-mini |
OpenAI | auto | Good structure, lower cost than full GPT-5 |
| Questionnaire parsing | gpt-5-mini |
OpenAI | auto | Structured Q&A extraction |
| RAG answer generation | gpt-4o-mini |
OpenAI | auto | Fast, cheap for short answers |
| Vendor risk assessment | gpt-5.2 |
OpenAI | auto | Complex multi-source analysis |
| Auditor content generation | gpt-5.2 |
OpenAI | auto | Long-form, factual business writing |
| Vision extraction | gpt-4o |
OpenAI | auto | Image understanding |
| SOA answering | gpt-5-mini / gpt-4o-mini |
OpenAI | auto | Structured compliance analysis |
| Task relevance matching | llama-4-scout-17b |
Groq | auto | Ultra-fast, cheap classification |
| Embeddings | text-embedding-3-small |
OpenAI | — | Cost-effective for RAG |
| Fast question parsing | meta-llama/gpt-oss-120b |
Groq | auto | Ultra-fast first attempt |
Model Selection Philosophy
Cost axis: Groq (cheapest) → GPT-4o-mini → GPT-5-mini → Claude Sonnet → GPT-5 → Claude Opus
Capability: Groq (fastest) → GPT-4o-mini → GPT-5-mini → Claude Sonnet → GPT-5 → Claude Opus
Context: Groq (32K) → GPT-4o-mini → GPT-5-mini → Claude (200K) → GPT-5 → Claude Opus
Rule: Use the cheapest model that can reliably do the job.
Exception: Cloud remediation uses Opus at temp 0 because wrong IAM policies = production outage.
AI System #1: RAG-Powered Questionnaire Answering
Architecture
Document Upload
│
┌────▼────┐
│ Extract │ mammoth (docx), exceljs (xlsx),
│ Content │ unpdf (pdf), Claude vision (images)
└────┬────┘
│
┌────▼────┐
│ Parse │ Groq (fast) → Claude (fallback) → OpenAI
│Questions │ Extracts Q&A pairs from content
└────┬────┘
│
┌────▼────┐
│ Generate │ text-embedding-3-small (OpenAI)
│Embeddings│ Stored in PostgreSQL via pgvector
└────┬────┘
│
For each question:
│
┌────▼────┐
│ Vector │ Similarity search against:
│ Search │ - Organization policies
└────┬────┘ - Context documents
│ - Manual answers
┌────▼────┐ - Knowledge base docs
│ RAG │
│ Answer │ gpt-4o-mini with strict guardrails
└────┬────┘
│
┌────▼────┐
│ Store │ questionnaire_question_answer table
│ Answers │ status: 'generated'
└─────────┘
Key Files
| File | Lines | Purpose |
|---|---|---|
apps/api/src/questionnaire/utils/content-extractor.ts |
~1092 | Multi-format file parsing |
apps/api/src/questionnaire/utils/question-parser.ts |
~200 | AI-powered Q&A extraction |
apps/api/src/questionnaire/utils/constants.ts |
~100 | System prompts |
apps/api/src/trigger/questionnaire/answer-question-helpers.ts |
~200 | RAG answer generation |
apps/api/src/vector-store/lib/core/generate-embedding.ts |
~50 | Embedding generation |
Guardrails
Answer generation prompt (constants.ts):
- Answer based ONLY on the provided context
- If insufficient evidence → "N/A - no evidence found"
- Use "we/our/us" voice for the organization
- Keep answers 1-3 sentences
- Never fabricate information not in context
Parsing fallback chain:
1. Groq (meta-llama/gpt-oss-120b) - ultra-fast, 25K char chunks
2. Claude Sonnet - 200K context for large documents
3. OpenAI gpt-4o-mini - final fallback
Vector Store Implementation
// apps/api/src/vector-store/lib/core/generate-embedding.ts
// Uses OpenAI text-embedding-3-small (1536 dimensions)
// Stored in PostgreSQL via pgvector extension
// Sources indexed:
// - Policies (full content, chunked)
// - Context Q&A (manual answers from onboarding/settings)
// - Knowledge base documents (uploaded files)
// - Manual answer entries (human overrides for questionnaires)
// Search: cosine similarity with top-k results
// Batch support: batchGenerateEmbeddings() for efficiencyAI System #2: Policy Generation & Editing
Policy Generation (Trigger.dev task)
Inputs:
- Policy template (FrameworkEditorPolicyTemplate)
- Organization data (industry, size, tech stack)
- Active frameworks (SOC 2, ISO 27001, etc.)
- Context hub answers (onboarding data)
Process:
1. Build comprehensive prompt with company info
2. Call gpt-5-mini for TipTap JSON structure
3. Sanitize output (remove "<<TO REVIEW>>" placeholders)
4. Align with template structure
5. Save to database with version tracking
Key file: apps/api/src/trigger/policies/update-policy-helpers.ts
Policy Chat Assistant (streaming)
Endpoint: POST /api/policies/[policyId]/chat
Model: claude-sonnet-4-6
Max steps: 5 (prevents runaway tool loops)
Tools available:
- getVendors: Fetch organization's vendor list
- getPolicies: Fetch other policies for cross-reference
- getEvidence: Fetch related evidence
- proposePolicy: Submit edited TipTap JSON
System prompt emphasizes:
- "PRESERVE UNCHANGED TEXT EXACTLY"
- Section boundary rules for headings/lists
- TipTap JSON structure requirements
- Prohibition on copying previous proposals
Key file: apps/app/src/app/api/policies/[policyId]/chat/route.ts
Section Editor (single-turn)
Endpoint: POST /api/policies/[policyId]/edit-section
Model: claude-sonnet-4-6
Purpose: Edit a single section without full policy context
Strips previous proposePolicy tool calls from history to prevent reuse
AI System #3: Cloud Security Remediation
Architecture
Security Finding (e.g., "S3 bucket public access enabled")
│
▼
Phase 1: Generate Initial Fix Plan
│ Model: claude-opus-4-6 (temperature: 0)
│ Input: Finding description + cloud provider
│ Output: { readSteps, fixSteps, rollbackSteps }
│
▼
Phase 2: Execute Read Steps
│ AWS SDK v3 command execution
│ Gathers actual resource state
│
▼
Phase 3: Refine Plan with Real Data
│ Model: claude-opus-4-6 (temperature: 0)
│ Input: Finding + actual AWS state
│ Output: Refined { readSteps, fixSteps, rollbackSteps }
│
▼
Execute Fix Steps (with acknowledgment)
│ Maps step commands to AWS SDK calls
│ Tracks: executing → success/failed/needs_permissions
│
▼
Rollback Available (if fix fails)
Key Files
| File | Purpose |
|---|---|
apps/api/src/cloud-security/ai-remediation.service.ts |
Orchestrates 2-phase fix planning |
apps/api/src/cloud-security/ai-remediation.prompt.ts |
AWS fix plan Zod schema + prompts |
apps/api/src/cloud-security/gcp-ai-remediation.prompt.ts |
GCP REST API fix schemas |
apps/api/src/cloud-security/azure-ai-remediation.prompt.ts |
Azure ARM API fix schemas |
apps/api/src/cloud-security/aws-command-executor.ts |
Maps AI output to AWS SDK calls |
Why Temperature 0?
// ai-remediation.service.ts
const result = await generateObject({
model: anthropic('claude-opus-4-6'),
temperature: 0, // CRITICAL: deterministic output
// ...
});Cloud remediation generates executable commands. A creative variation in an IAM policy could:
- Grant excessive permissions (security risk)
- Block legitimate access (operational risk)
- Produce invalid JSON (execution failure)
Temperature 0 ensures reproducible, exact outputs.
Multi-Cloud Schema Design
Each cloud provider has its own Zod schema for fix plans:
AWS: Uses SDK v3 command class names
// FixStep for AWS
{
service: "S3",
command: "PutPublicAccessBlock",
params: { Bucket: "my-bucket", ... }
}GCP: Uses REST API endpoints
// FixStep for GCP
{
method: "PATCH",
url: "https://storage.googleapis.com/storage/v1/b/my-bucket",
body: { ... }
}Azure: Uses ARM REST API
// FixStep for Azure
{
method: "PUT",
url: "https://management.azure.com/subscriptions/.../providers/...",
body: { ... }
}AI System #4: Vendor Risk Assessment
Pipeline
Vendor created/updated
│
▼
Trigger.dev: vendor-risk-assessment-task
│
├── Firecrawl: Scrape vendor website (core pages)
├── Firecrawl: Research vendor news/incidents
│
▼
LLM Analysis (gpt-5.2)
│ Input: Website content + news + existing vendor data
│ Output: Structured risk assessment
│ - Security posture analysis
│ - Risk scores (low/medium/high)
│ - Compliance certification detection
│ - Task generation for remediation
│
▼
PostgreSQL advisory lock (prevents concurrent assessment)
│
▼
Save: VendorRiskAssessment with version tracking
Create: TaskItems for follow-up actions
Deduplication
// PostgreSQL advisory locks prevent concurrent vendor assessment
// Keyed by website domain hash
// Versions: v1, v2, v3... for re-runsAI System #5: Assistant Chat (API)
Architecture
Endpoint: POST /v1/assistant-chat/completions
Model: gpt-5 (OpenAI)
Streaming: Server-sent events via streamText()
Steps limit: 5 (prevents runaway)
Tools (permission-gated per user):
- findOrganization: Always available
- getUser: Always available
- getPolicies: Requires policy:read
- getPolicyContent: Requires policy:read
- getRisks: Requires risk:read
- getRiskById: Requires risk:read
History: Ephemeral, stored in Upstash Redis
Permission-Gated Tool Pattern
// apps/api/src/assistant-chat/assistant-chat-tools.ts
function buildTools(permissions: UserPermissions) {
const tools = {
findOrganization: { ... }, // Always available
getUser: { ... }, // Always available
};
if (hasPermission(permissions, 'policy', 'read')) {
tools.getPolicies = { ... };
tools.getPolicyContent = { ... };
}
if (hasPermission(permissions, 'risk', 'read')) {
tools.getRisks = { ... };
tools.getRiskById = { ... };
}
return tools;
}This ensures the LLM cannot access data the user doesn't have permission to see, even through tool calls.
AI System #6: Browser Automation
Stack
Browserbase (cloud browser infrastructure)
└── Stagehand v3 (AI browser agent)
└── Claude Sonnet 4.6 (visual understanding)
└── Playwright (browser protocol)
How It Works
// apps/api/src/browserbase/browserbase.service.ts
// 1. Create/reuse persistent browser context per org
const contextId = await getOrCreateOrgContext(orgId);
// 2. Create session with context
const session = await browserbase.sessions.create({
projectId: BROWSERBASE_PROJECT_ID,
browserSettings: { context: { id: contextId } }
});
// 3. Initialize Stagehand with Claude
const stagehand = new Stagehand({
browserbaseSessionID: session.id,
modelName: 'anthropic/claude-sonnet-4-6',
modelClientOptions: { apiKey: ANTHROPIC_API_KEY }
});
// 4. Execute natural-language tasks (max 20 steps)
await stagehand.agent.execute(taskInstructions);
// 5. Capture screenshots → upload to S3 → return presigned URLsUse Cases
- Evidence collection from SaaS dashboards
- Automated compliance checks that require browser interaction
- Login verification and status checks
AI System #7: Auditor Content Generation
Model: gpt-5.2
Trigger: Trigger.dev task
Generates sections:
- Company background
- Services provided
- Mission & vision
- System description
- Critical vendors (filtered for SOC 2 relevance)
- Subservice organizations
Data sources:
- Organization context hub answers
- Website scraping (if URL available)
Guardrails:
- "NEVER mention missing information"
- "Write about what IS available"
- "No hedging words (may, might, likely)"
- "No attribution phrases"
AI System #8: Task Automation Chat
Frontend Architecture
React component: chat.tsx
Framework: @ai-sdk/react useChat() hook
Transport: DefaultChatTransport → /api/tasks-automations/chat
Features:
- Streaming with visible reasoning steps
- Dynamic model selection via AI Gateway
- Ephemeral → persistent automation transition
- Tools: web search (Exa), website crawling (Firecrawl)
- Secret injection and info context provision
Model Gateway
// apps/app/src/.../tools/gateway.ts
// AI Gateway allows runtime model selection
// User can choose model + reasoning effort
// Reasoning effort: minimal | low | mediumGuardrails & Safety Patterns
1. Step Limiting
// Prevents runaway tool calling loops
streamText({
maxSteps: 5,
// or
stopCondition: stepCountIs(5),
});Used in: policy chat, assistant chat, section editor
2. Temperature Control
// Deterministic outputs for safety-critical operations
temperature: 0 // Cloud remediation (IAM policies, CLI commands)
temperature: auto // Creative tasks (policy writing, chat)3. Zod Schema Validation
// All structured LLM outputs validated before use
const result = await generateObject({
schema: fixPlanSchema, // Zod schema
// ...
});
// Invalid outputs throw NoObjectGeneratedError → fallback handling4. Context Grounding (RAG)
"Answer based ONLY on the provided context"
"If insufficient → respond 'N/A - no evidence found'"
Prevents hallucination in questionnaire answers and SOA responses.
5. Content Truncation
// Groq: 25K char chunks (32K context limit)
// General parsing: 80K char chunks
// Vision models: document slicing at 80K chars6. Permission-Gated Tools
// Assistant chat tools filtered by user permissions
// LLM can only call tools the user has access to7. Fallback Chains
Groq (fast/cheap) → Claude (large context) → OpenAI (reliable)
Resilient parsing even when primary provider is down.
8. Error Handling
// NoObjectGeneratedError: Special handling with JSON.parse() fallback
// Missing API keys: Returns 503 (not 500)
// Browserbase failures: Actionable error messages
// Vendor assessment: 2-attempt retry with advisory locksCost Optimization Patterns
1. Batch Operations
// Instead of N embedding calls, batch them
batchGenerateEmbeddings(texts); // Uses embedMany()
generateAnswerWithRAGBatch(); // Pre-fetch vectors + parallel LLM
batchSearchSOAQuestions(); // Pre-fetch all control vectors2. Model Tiering
Simple classification → Groq Llama (cents)
Structured extraction → GPT-4o-mini (pennies)
Complex reasoning → Claude Sonnet (dimes)
Safety-critical → Claude Opus (dollars)
3. Streaming Responses
Policy chat and assistant use streamText() for:
- Better perceived latency (first token fast)
- User can cancel early (saves tokens)
- Progressive rendering
4. Chunking Strategy
Small docs: Single LLM call
Large docs (>25K): Chunk and process in parallel
Huge docs (>80K): Use Claude 200K context as fallback
Images/PDFs: Vision models for extraction
Prompt Engineering Patterns
Role Establishment
"You are an expert in GRC (Governance, Risk, and Compliance)"
"You are a helpful assistant in Comp AI"
Structured Output Instructions
"Return a JSON object with the following structure..."
"Use TipTap JSON format for policy content"
"Generate AWS SDK v3 command names, not CLI commands"
Negative Instructions (What NOT to Do)
"NEVER mention missing information"
"Do NOT use general knowledge"
"NEVER hallucinate data not in context"
"Do NOT copy previous policy proposals"
Voice & Tone Control
"Use 'we/our/us' voice for the organization"
"No hedging words (may, might, likely)"
"Keep answers 1-3 sentences"
Context Injection
"Current date: {date}"
"Organization: {name} in {industry}"
"Active frameworks: {frameworks}"
"Company size: {size} employees"
What's Notable
Strengths
- Model diversity - Not locked to one provider; uses the right model for each task
- RAG is core, not afterthought - Vector store deeply integrated with compliance workflow
- Permission-aware AI - Tools respect RBAC, preventing data leakage through AI
- Deterministic where it matters - Temperature 0 for cloud remediation prevents dangerous creative outputs
- Fallback chains - Graceful degradation across LLM providers
- Audit trail includes AI - Generated answers tracked separately from manual ones
Potential Improvements
- No token counting or budget enforcement - No visible per-org cost tracking
- No content filtering layer - Relies on model-level safety, no explicit input/output filtering
- No prompt injection defense - User-uploaded documents feed directly into prompts
- Embedding model is basic -
text-embedding-3-smallmay miss nuance; no reranking step - No A/B testing of models - Model selection is hardcoded, not experimentally validated
- No caching of LLM responses - Repeated identical queries hit the API each time