05 - AI & LLM Architecture
Models
| Model | Provider | Usage |
|---|---|---|
gpt-5 |
OpenAI | AI chat assistant, questionnaire answering, vendor assessment |
text-embedding-3-small |
OpenAI | Vector embeddings for semantic search |
| Anthropic models | Anthropic | Alternative LLM provider (via AI SDK) |
| Groq models | Groq | Alternative LLM provider (via AI SDK) |
The specific model used for questionnaire answering is configured via the ANSWER_MODEL constant in apps/api/src/questionnaire/utils/constants.ts.
Framework: Vercel AI SDK
All LLM interactions use the Vercel AI SDK (ai package) for a unified multi-provider abstraction:
import { openai } from '@ai-sdk/openai';
import { streamText, generateText, embed, embedMany } from 'ai';Key functions used:
streamText()— Streaming chat responses with tool callinggenerateText()— Single-shot text generation (RAG answers, policy content)embed()— Single embedding generationembedMany()— Batch embedding generation
Provider packages: @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/groq
RAG Pipeline (Retrieval-Augmented Generation)
File: apps/api/src/trigger/questionnaire/answer-question-helpers.ts
The RAG pipeline answers security questionnaire questions using organization-specific data:
Question input
│
├─ 1. Generate embedding (text-embedding-3-small)
│
├─ 2. Vector search (Upstash Vector)
│ → Filter: organizationId
│ → Top 100 results
│ → Min similarity: 0.2
│ → Sources: policies, context Q&A, knowledge base docs, manual answers
│
├─ 3. Deduplicate sources
│ → Group by sourceType + sourceId
│ → Keep highest-scoring per source
│
├─ 4. Build context string
│ → "[1] Source: Policy 'Access Control'\n<content>"
│ → "[2] Source: Context Q&A\n<content>"
│ → ...
│
├─ 5. LLM generation
│ → System prompt: ANSWER_SYSTEM_PROMPT
│ → User prompt: "Based on the following context... Answer ONLY from provided context"
│ → First person plural (we, our, us)
│
└─ 6. Post-processing
→ Detect "N/A - no evidence found" → return null
→ Return answer + attributed sources with scores
Batch optimization: generateAnswerWithRAGBatch() processes multiple questions by:
- Generating all embeddings in a single
embedMany()call - Running vector searches in parallel (
Promise.all) - Running LLM generations in parallel
Embedding Generation
File: apps/api/src/vector-store/lib/core/generate-embedding.ts
Two functions for embedding generation:
// Single embedding
export async function generateEmbedding(text: string): Promise<number[]> {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: text,
});
return embedding;
}
// Batch embeddings (single API call for N texts)
export async function batchGenerateEmbeddings(texts: string[]): Promise<number[][]> {
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: validTexts.map(v => v.text),
});
return result;
}Embeddings are stored in Upstash Vector with metadata:
organizationId— for server-side filteringsourceType—policy,context,knowledge_base_document,manual_answer,attachmentsourceId— reference to the source recordcontent— the embedded text chunkpolicyName,contextQuestion,documentName— human-readable labels
Vector Store Sync
Directory: apps/api/src/vector-store/lib/sync/
The vector store is kept in sync with source data via incremental sync pipelines:
| Sync Pipeline | Source Data | Trigger |
|---|---|---|
sync-policies.ts |
Policy documents (TipTap JSON → text extraction) | Org sync, policy update |
sync-context.ts |
Organization context Q&A pairs | Org sync, context update |
sync-knowledge-base.ts |
Uploaded knowledge base documents | Document upload |
sync-manual-answer.ts |
Manual questionnaire answers | Answer creation/update |
syncOrganizationEmbeddings() orchestrates a full incremental sync for an organization, running before questionnaire answering to ensure fresh data.
Text is chunked via chunk-text.ts before embedding, and policy text is extracted from TipTap JSON via extract-policy-text.ts.
AI Chat Assistant
File: apps/app/src/app/api/chat/route.ts
The chat endpoint provides a streaming AI assistant for compliance guidance:
const result = streamText({
model: openai('gpt-5'),
system: systemPrompt,
messages: convertToModelMessages(messages),
tools,
});
return result.toUIMessageStreamResponse();System prompt includes:
- GRC expert persona
- Organization context (
organizationId) - Current date/time
- Instruction to prefer tool calls over guessing
- Markdown formatting constraints
Tools (apps/app/src/data/tools/):
| Tool | File | Purpose |
|---|---|---|
| Organization | organization.ts |
Fetch org details, members, settings |
| Policies | policies.ts |
Query policy documents and status |
| Risks | risks-tool.ts |
Retrieve risk register entries |
| User | user.ts |
Get current user info and permissions |
These tools let the LLM fetch live organization data during conversations. The response is streamed to the client via toUIMessageStreamResponse().
Auth: Session-based (Better Auth cookies). Validates user membership in the specified organization before processing.
Policy Generation
File: apps/app/src/trigger/lib/prompts.ts
Policies are generated by editing TipTap JSON templates with LLM assistance:
Template (TipTap JSON with placeholders)
+ Company info (name, website)
+ Framework context (SOC2, HIPAA flags)
+ Knowledge base Q&A (Context Hub)
→ LLM generates final TipTap JSON
Placeholder system:
| Placeholder | Maps to |
|---|---|
{{COMPANY}} |
Company name |
{{COMPANYINFO}} |
Company description |
{{INDUSTRY}} |
Industry sector |
{{EMPLOYEES}} |
Employee count |
{{DEVICES}} |
Team device types |
{{SOFTWARE}} |
Software used |
{{LOCATION}} |
Work arrangement |
{{CRITICAL}} |
Hosting/infrastructure |
{{DATA}} |
Data types handled |
{{GEO}} |
Data location |
Handlebars-style conditionals:
{{#if soc2}}
SOC 2-specific policy content...
{{/if}}
{{#if hipaa}}
HIPAA-specific policy content...
{{/if}}
The LLM evaluates these conditionals based on the organization's selected frameworks and removes unmatched blocks.
Vendor Risk Assessment
Vendor risk assessment combines web scraping with structured LLM output:
-
Research (
apps/app/src/trigger/lib/research.ts):- Submits vendor website URL to Firecrawl API (
/v1/extract) - Polls for completion (5-second intervals, 5-minute timeout)
- Validates extracted data against Zod schema
- Options for
onlyMainContentandremoveBase64Images
- Submits vendor website URL to Firecrawl API (
-
Assessment (
apps/app/src/trigger/tasks/onboarding/generate-vendor-mitigation.ts):- LLM generates structured risk assessment using Zod output schema
- Includes risk scoring, categorization, and mitigation recommendations
- Uses
pg_advisory_lockfor concurrent write safety - Saves assessment to database
Guardrails
Context Bounding
The RAG pipeline instructs the LLM to "answer ONLY from the provided context" and use first person plural (we, our, us). If context is insufficient, the LLM must respond with "N/A - no evidence found".
Output Validation
- Zod schemas validate structured LLM output (vendor assessments, research extraction)
- Firecrawl responses are validated at both the job status level and the final data level
Source Attribution
Every RAG answer includes attributed sources with similarity scores. Sources are deduplicated by sourceType + sourceId, keeping the highest-scoring entry per source.
No-Evidence Detection
function isNoEvidenceAnswer(answer: string): boolean {
const lowerAnswer = answer.toLowerCase();
return (
lowerAnswer.includes('n/a') ||
lowerAnswer.includes('no evidence') ||
lowerAnswer.includes('not found in the context')
);
}Answers detected as "no evidence" are returned as null with empty sources, preventing hallucinated responses from reaching users.
Rate Limiting
The NestJS API applies global rate limiting via ThrottlerModule:
- 100 requests per 60 seconds per IP address
- Applied as a global
APP_GUARD
Similarity Threshold
Vector search uses a minimum similarity score of 0.2 to filter noise while maintaining high recall. Results below this threshold are discarded before context building.
Request Limits
- Chat endpoint:
maxDuration = 30seconds - Body parser: 150 MB limit (for base64 file uploads)
- Firecrawl polling: 5-minute timeout with 5-second intervals