Design Patterns & Decisions

Architectural Patterns

1. Registry Pattern (Integration Platform)

Where: packages/integration-platform/src/registry/index.ts

// Code manifests registered at import time
registerCodeManifest(awsManifest);
registerCodeManifest(githubManifest);
// ...9 total code-based integrations
 
// Dynamic manifests from database at runtime
registerDynamicManifest(dbManifest);
 
// Lookup
getManifest('aws');  // Returns full manifest with checks, auth, etc.

Why this pattern: Integrations need both static (code-based) and dynamic (DB-stored) definitions. The registry provides a single lookup point while preventing ID collisions and allowing hot-loading of new integrations.

Tradeoff: Code manifests cannot be overridden at runtime. This is intentional - core integrations (AWS, GCP) shouldn't be accidentally replaced.

2. Strategy Pattern (Authentication)

Where: packages/integration-platform/src/types.ts (auth strategies)

// Auth strategy varies per integration:
auth: 
  | { type: 'oauth2', clientId, clientSecret, scopes, ... }
  | { type: 'api_key', headerName, prefix, ... }
  | { type: 'basic', ... }
  | { type: 'jwt', ... }
  | { type: 'custom', ... }

Also used in: HybridAuthGuard where three authentication strategies (API Key, Service Token, Session) are tried in order.

Why: Each integration has its own auth requirements. The strategy pattern lets the CheckContext runtime inject the right auth headers automatically, regardless of which strategy the integration uses.

3. Adapter Pattern (Cloud Providers)

Where: apps/api/src/cloud-security/providers/

50+ AWS service adapters standardize diverse cloud APIs into a unified finding format:

// Each adapter implements:
interface CloudSecurityAdapter {
  scan(credentials, options): Promise<Finding[]>;
}
 
// Adapters for: S3, RDS, EC2, Lambda, IAM, CloudTrail, 
// SecurityHub, GuardDuty, ECS, EKS, etc.

Why: AWS alone has 200+ services, each with different API shapes. Adapters normalize them into a standard Finding format with severity, remediation, and evidence.

4. Interpreter Pattern (DSL Engine)

Where: packages/integration-platform/src/dsl/interpreter.ts

// JSON definition → executable check
{
  steps: [
    { type: 'fetch', path: '/api/repos', as: 'repos' },
    { type: 'forEach', items: '$.repos', as: 'repo',
      steps: [
        { type: 'fetch', path: '/api/repos/{{repo.id}}/settings' },
        { type: 'branch', condition: { field: 'branchProtection', op: 'falsy' },
          then: [{ type: 'emit', result: 'fail', ... }],
          else: [{ type: 'emit', result: 'pass', ... }]
        }
      ]
    }
  ]
}

Why: Non-developer users (compliance consultants) should be able to define checks without writing TypeScript. The DSL provides a safe, declarative alternative to code-based checks.

Expression operators (16 total): eq, neq, gt, gte, lt, lte, exists, notExists, truthy, falsy, contains, matches, in, age_within_days, age_exceeds_days

5. Decorator Pattern (NestJS Guards & Interceptors)

Where: Throughout apps/api/src/

@Controller({ path: 'controls', version: '1' })
@UseGuards(HybridAuthGuard, PermissionGuard)  // Auth + RBAC
@UseInterceptors(AuditLogInterceptor)          // Automatic audit logging
export class ControlsController {
  
  @Post()
  @RequirePermission('control', 'create')      // Endpoint-level permission
  @SkipAuditLog()                              // Opt out of audit
  create() { ... }
}

Decorator composition: Guards execute in order (auth before permissions). Interceptors wrap the handler (audit log captures before/after state). Custom decorators (@Public(), @SkipOrgCheck(), @AuditRead()) toggle behavior via metadata.

6. Template Method (CheckContext)

Where: packages/integration-platform/src/runtime/check-context.ts

// Every check gets a rich context with pre-built helpers:
check.run(async (ctx) => {
  // HTTP methods auto-inject auth headers
  const repos = await ctx.fetchAllPages('/api/repos');
  
  for (const repo of repos) {
    if (repo.branchProtection) {
      ctx.pass({ title: `${repo.name} has branch protection`, ... });
    } else {
      ctx.fail({ title: `${repo.name} missing branch protection`, ... });
    }
  }
});

Why: Check authors shouldn't deal with authentication, pagination, or result recording. The context provides the algorithm skeleton; check authors fill in the compliance logic.

7. Observer Pattern (SWR Mutations)

Where: Throughout apps/app/ SWR hooks

// Optimistic update with rollback
const updateTask = async (taskId, data) => {
  await mutate(
    cacheKey,
    async (current) => {
      const result = await apiClient.patch(`/tasks/${taskId}`, data);
      return { ...current, data: current.data.map(t => 
        t.id === taskId ? result.data : t
      )};
    },
    { optimisticData: (current) => ({
      ...current,
      data: current.data.map(t => 
        t.id === taskId ? { ...t, ...data } : t
      )
    })}
  );
};

Pattern: Immediate UI update (optimistic), background API call, rollback on failure. SWR's cache acts as the observable; all components subscribed to the same key re-render.

8. Factory Pattern (Integration Handler Lookup)

Where: packages/integrations/src/factory.ts

// Factory creates the right handler based on integration type
const handler = createHandler(integration.integrationId);
// Returns: provider-specific check runner with correct auth

Frontend Patterns

Server Component → Client Component Hydration

// Server component (page.tsx)
export default async function ControlsPage({ params }) {
  const { data } = await serverApi.get('/v1/controls');
  return <ControlsClient initialData={data} />;
}
 
// Client component
function ControlsClient({ initialData }) {
  const { data } = useSWR('/v1/controls', fetcher, {
    fallbackData: initialData,
    revalidateOnMount: !initialData,
  });
  // Renders with SSR data immediately, then revalidates
}

Why: First paint is instant (SSR data). Client takes over for reactivity. revalidateOnMount: !initialData prevents unnecessary refetch when SSR data is fresh.

URL State with nuqs

const [sheetOpen, setSheetOpen] = useQueryState('risk-sheet');
// URL: /vendors/vnd_123?risk-sheet=true
// Benefits: shareable links, back button works, survives refresh

When used: Sheet/drawer open states, filter selections, active tabs. Keeps UI state in the URL for shareability.

Responsive Sheet/Drawer

const isDesktop = useMediaQuery('(min-width: 768px)');
 
return isDesktop ? (
  <Sheet open={isOpen}>
    <SheetContent><SheetHeader>...</SheetHeader><SheetBody>...</SheetBody></SheetContent>
  </Sheet>
) : (
  <Drawer open={isOpen}>
    <DrawerContent>...</DrawerContent>
  </Drawer>
);

Pattern: Same data, different container based on viewport. Used consistently across vendor creation, risk assessment, task details.

Form Pattern (React Hook Form + Zod)

// 1. Define schema
const schema = z.object({
  name: z.string().min(1),
  website: z.string().url().optional().or(z.literal('')),
  category: z.nativeEnum(VendorCategory),
});
 
// 2. Infer type
type FormData = z.infer<typeof schema>;
 
// 3. Use form
const form = useForm<FormData>({
  resolver: zodResolver(schema),
  mode: 'onChange',
});
 
// 4. Controller for complex inputs
<Controller
  name="category"
  control={form.control}
  render={({ field }) => <Select {...field} />}
/>

Rule: Never useState for form fields. Zod schemas are the single source of validation truth.

Backend Patterns

Controller → Service → Prisma

// Controller: HTTP concerns only
@Post()
@RequirePermission('control', 'create')
async create(
  @OrganizationId() orgId: string,
  @Body() dto: CreateControlDto,
) {
  return this.controlsService.create(orgId, dto);
}
 
// Service: Business logic
async create(orgId: string, dto: CreateControlDto) {
  return this.db.control.create({
    data: { ...dto, organizationId: orgId },
  });
}

Why the split: Controllers handle HTTP (validation, auth decorators, response shaping). Services handle business rules and database operations. This makes services testable without HTTP.

Multi-Tenancy Enforcement

// Every service method takes organizationId as first parameter
// Every Prisma query includes WHERE organizationId = ?
// Never trust client-supplied orgId for data access
 
async findOne(orgId: string, id: string) {
  return this.db.control.findFirst({
    where: { id, organizationId: orgId },  // Both conditions
  });
}

Critical: The organizationId comes from the authenticated request (guard-resolved), not from URL params or request body.

Audit Log Auto-Generation

// AuditLogInterceptor (global, automatic)
// 1. Before handler: fetch previous state for PATCH/PUT
// 2. After handler: compute diff
// 3. Store: { changes: { field: { previous, current } } }
 
// ONLY fires when:
// - HTTP method is POST/PATCH/PUT/DELETE (or @AuditRead() on GET)
// - @RequirePermission metadata is present
// - @SkipAuditLog() is NOT present

Design decision: Tying audit logging to @RequirePermission ensures every mutation that matters is logged. If an endpoint lacks both decorators, it's either public or a bug.

Notable Design Tradeoffs

1. SWR + React Query Coexistence

The codebase uses both SWR and TanStack React Query. SWR handles most data fetching (simpler API, built-in cache), while React Query provides dehydrate/hydrate for SSR with SuperJSON serialization.

Tradeoff: Two caching systems increase complexity. But SWR's simplicity wins for most CRUD; React Query's SSR story is better for complex pages.

2. Comma-Separated Roles vs. Join Table

member.role = "admin,auditor"  // String field

Tradeoff: Simple to store and query (role LIKE '%admin%'), but:

No referential integrity
Substring matching could false-positive (role "radmin" contains "admin")
Can't easily query "all members with exactly role X"

The codebase mitigates this by always splitting and using exact match in application code.

3. Session-Only Auth (No JWTs)

Benefits: No token refresh logic, no token storage vulnerabilities, server can revoke sessions instantly.

Cost: Every API call requires a database/cache lookup. Cross-subdomain requires SameSite: None cookies. Can't use tokens for third-party integrations (uses API keys instead).

4. Split Prisma Schema (45 files)

Benefits: Each domain has its own schema file. Easier to find models, less merge conflicts.

Cost: Prisma combines them at build time. Cross-file relationships require explicit imports. Some IDE tools don't handle multi-file schemas well.

5. Service Tokens vs. OAuth for Internal Services

// Service tokens are simple env vars
X-Service-Token: ${SERVICE_TOKEN_TRIGGER}

Benefits: No OAuth dance for internal services. Scoped permissions. Timing-safe comparison.

Cost: Token rotation requires deployment. No automatic expiration. Single token per service (no per-instance scoping).

6. DSL + Code Checks (Dual System)

Benefits: DSL for simple checks (no code needed), code for complex logic (full TypeScript power).

Cost: Two systems to maintain. DSL has limited expressiveness. Check authors must choose which to use.

What's Clever

1. API Key Prefix Indexing

// Key format: comp_ + 32 hex chars
// Prefix: first 8 chars after comp_
// Lookup: SELECT * FROM api_key WHERE keyPrefix = 'abc12345'
// Then: SHA256 compare against full key hash
 
// O(1) lookup without storing the plaintext key
// Legacy fallback: scan all keys if prefix not yet backfilled

2. CheckContext Auto-Auth

// Check author writes:
const data = await ctx.get('/api/repos');
 
// CheckContext transparently:
// 1. Looks up auth strategy (OAuth? API key? Bearer?)
// 2. Refreshes token if expired
// 3. Injects correct header (Authorization: Bearer ..., X-API-Key: ..., etc.)
// 4. Handles pagination if needed

3. Two-Phase AI Remediation

Phase 1 generates a plan from the finding description alone. Phase 2 executes read-only steps to get actual cloud state, then regenerates the plan with real data. This prevents hallucinated resource names or incorrect configurations.

4. Permission-Gated AI Tools

The assistant chat builds its tool set dynamically based on the user's permissions. An employee can chat with the AI but won't see tools for accessing policies or risks they can't view directly.

5. Ephemeral → Persistent Automation

Task automation chat starts with a temporary ID. Only when the user sends their first message does it create a real database record. This avoids cluttering the DB with empty automations from users who opened the chat and left.

6. Task Mapping Auto-Complete

When an integration check passes (e.g., "GitHub branch protection enabled"), it automatically marks related compliance tasks as complete. The mapping is defined in the integration manifest:

check: {
  taskMapping: TASK_TEMPLATES.codeChanges,
  run: async (ctx) => {
    ctx.pass({ ... });  // This auto-completes linked tasks
  }
}

7. Advisory Locks for Vendor Assessment

// PostgreSQL advisory locks prevent concurrent AI assessment of the same vendor
// Keyed by website domain hash
// Prevents: duplicate Firecrawl calls, race conditions on save
await db.$executeRaw`SELECT pg_advisory_lock(${hash})`;

What Could Be Improved

1. API Response Inconsistency

List endpoints return { data: [...], count } while single resources return the entity directly. The client wrapper normalizes this, but it creates a hidden abstraction layer that new developers must learn.

2. No Request ID Tracing

Audit logs track mutations but there's no request-level correlation ID flowing through the system. Debugging a failed multi-step operation requires correlating by timestamp.

3. Global Validation Pipe Transform Gotcha

// main.ts: ValidationPipe with transform: true
// This mangles nested JSON objects (like TipTap content)
// Workaround: Use @Req() req for complex payloads
// This is documented but easily forgotten

4. No Circuit Breaker for LLM Calls

If an LLM provider is down, each request still attempts the call and waits for timeout. A circuit breaker could fail fast after N consecutive failures.

5. Integration Test Coverage

The test setup uses mock overrides for guards and DB, which is standard but means integration tests don't verify the full auth → permission → audit chain. Real Postgres-backed tests would catch multi-tenancy leaks.