CodeDocs Vault

05 - AI & LLM Architecture

Models

Model Provider Usage
gpt-5 OpenAI AI chat assistant, questionnaire answering, vendor assessment
text-embedding-3-small OpenAI Vector embeddings for semantic search
Anthropic models Anthropic Alternative LLM provider (via AI SDK)
Groq models Groq Alternative LLM provider (via AI SDK)

The specific model used for questionnaire answering is configured via the ANSWER_MODEL constant in apps/api/src/questionnaire/utils/constants.ts.

Framework: Vercel AI SDK

All LLM interactions use the Vercel AI SDK (ai package) for a unified multi-provider abstraction:

import { openai } from '@ai-sdk/openai';
import { streamText, generateText, embed, embedMany } from 'ai';

Key functions used:

Provider packages: @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/groq

RAG Pipeline (Retrieval-Augmented Generation)

File: apps/api/src/trigger/questionnaire/answer-question-helpers.ts

The RAG pipeline answers security questionnaire questions using organization-specific data:

Question input
  │
  ├─ 1. Generate embedding (text-embedding-3-small)
  │
  ├─ 2. Vector search (Upstash Vector)
  │     → Filter: organizationId
  │     → Top 100 results
  │     → Min similarity: 0.2
  │     → Sources: policies, context Q&A, knowledge base docs, manual answers
  │
  ├─ 3. Deduplicate sources
  │     → Group by sourceType + sourceId
  │     → Keep highest-scoring per source
  │
  ├─ 4. Build context string
  │     → "[1] Source: Policy 'Access Control'\n<content>"
  │     → "[2] Source: Context Q&A\n<content>"
  │     → ...
  │
  ├─ 5. LLM generation
  │     → System prompt: ANSWER_SYSTEM_PROMPT
  │     → User prompt: "Based on the following context... Answer ONLY from provided context"
  │     → First person plural (we, our, us)
  │
  └─ 6. Post-processing
        → Detect "N/A - no evidence found" → return null
        → Return answer + attributed sources with scores

Batch optimization: generateAnswerWithRAGBatch() processes multiple questions by:

  1. Generating all embeddings in a single embedMany() call
  2. Running vector searches in parallel (Promise.all)
  3. Running LLM generations in parallel

Embedding Generation

File: apps/api/src/vector-store/lib/core/generate-embedding.ts

Two functions for embedding generation:

// Single embedding
export async function generateEmbedding(text: string): Promise<number[]> {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: text,
  });
  return embedding;
}
 
// Batch embeddings (single API call for N texts)
export async function batchGenerateEmbeddings(texts: string[]): Promise<number[][]> {
  const { embeddings } = await embedMany({
    model: openai.embedding('text-embedding-3-small'),
    values: validTexts.map(v => v.text),
  });
  return result;
}

Embeddings are stored in Upstash Vector with metadata:

Vector Store Sync

Directory: apps/api/src/vector-store/lib/sync/

The vector store is kept in sync with source data via incremental sync pipelines:

Sync Pipeline Source Data Trigger
sync-policies.ts Policy documents (TipTap JSON → text extraction) Org sync, policy update
sync-context.ts Organization context Q&A pairs Org sync, context update
sync-knowledge-base.ts Uploaded knowledge base documents Document upload
sync-manual-answer.ts Manual questionnaire answers Answer creation/update

syncOrganizationEmbeddings() orchestrates a full incremental sync for an organization, running before questionnaire answering to ensure fresh data.

Text is chunked via chunk-text.ts before embedding, and policy text is extracted from TipTap JSON via extract-policy-text.ts.

AI Chat Assistant

File: apps/app/src/app/api/chat/route.ts

The chat endpoint provides a streaming AI assistant for compliance guidance:

const result = streamText({
  model: openai('gpt-5'),
  system: systemPrompt,
  messages: convertToModelMessages(messages),
  tools,
});
 
return result.toUIMessageStreamResponse();

System prompt includes:

Tools (apps/app/src/data/tools/):

Tool File Purpose
Organization organization.ts Fetch org details, members, settings
Policies policies.ts Query policy documents and status
Risks risks-tool.ts Retrieve risk register entries
User user.ts Get current user info and permissions

These tools let the LLM fetch live organization data during conversations. The response is streamed to the client via toUIMessageStreamResponse().

Auth: Session-based (Better Auth cookies). Validates user membership in the specified organization before processing.

Policy Generation

File: apps/app/src/trigger/lib/prompts.ts

Policies are generated by editing TipTap JSON templates with LLM assistance:

Template (TipTap JSON with placeholders)
  + Company info (name, website)
  + Framework context (SOC2, HIPAA flags)
  + Knowledge base Q&A (Context Hub)
  → LLM generates final TipTap JSON

Placeholder system:

Placeholder Maps to
{{COMPANY}} Company name
{{COMPANYINFO}} Company description
{{INDUSTRY}} Industry sector
{{EMPLOYEES}} Employee count
{{DEVICES}} Team device types
{{SOFTWARE}} Software used
{{LOCATION}} Work arrangement
{{CRITICAL}} Hosting/infrastructure
{{DATA}} Data types handled
{{GEO}} Data location

Handlebars-style conditionals:

{{#if soc2}}
  SOC 2-specific policy content...
{{/if}}

{{#if hipaa}}
  HIPAA-specific policy content...
{{/if}}

The LLM evaluates these conditionals based on the organization's selected frameworks and removes unmatched blocks.

Vendor Risk Assessment

Vendor risk assessment combines web scraping with structured LLM output:

  1. Research (apps/app/src/trigger/lib/research.ts):

    • Submits vendor website URL to Firecrawl API (/v1/extract)
    • Polls for completion (5-second intervals, 5-minute timeout)
    • Validates extracted data against Zod schema
    • Options for onlyMainContent and removeBase64Images
  2. Assessment (apps/app/src/trigger/tasks/onboarding/generate-vendor-mitigation.ts):

    • LLM generates structured risk assessment using Zod output schema
    • Includes risk scoring, categorization, and mitigation recommendations
    • Uses pg_advisory_lock for concurrent write safety
    • Saves assessment to database

Guardrails

Context Bounding

The RAG pipeline instructs the LLM to "answer ONLY from the provided context" and use first person plural (we, our, us). If context is insufficient, the LLM must respond with "N/A - no evidence found".

Output Validation

Source Attribution

Every RAG answer includes attributed sources with similarity scores. Sources are deduplicated by sourceType + sourceId, keeping the highest-scoring entry per source.

No-Evidence Detection

function isNoEvidenceAnswer(answer: string): boolean {
  const lowerAnswer = answer.toLowerCase();
  return (
    lowerAnswer.includes('n/a') ||
    lowerAnswer.includes('no evidence') ||
    lowerAnswer.includes('not found in the context')
  );
}

Answers detected as "no evidence" are returned as null with empty sources, preventing hallucinated responses from reaching users.

Rate Limiting

The NestJS API applies global rate limiting via ThrottlerModule:

Similarity Threshold

Vector search uses a minimum similarity score of 0.2 to filter noise while maintaining high recall. Results below this threshold are discarded before context building.

Request Limits