CodeDocs Vault

Architecture

This document describes the technical architecture of AIGovHub CLI.

Overview

AIGovHub CLI is designed with these principles:

Project Structure

aigovhub-cli/
├── src/
│   └── aigovhub/
│       ├── __init__.py          # Package entry, version
│       ├── __main__.py          # python -m aigovhub entry
│       ├── py.typed             # PEP 561 marker
│       │
│       ├── cli/                 # CLI Layer
│       │   ├── app.py           # Typer application
│       │   ├── output.py        # Rich formatters
│       │   └── commands/        # Command modules
│       │
│       ├── core/                # Shared Utilities
│       │   ├── config.py        # Configuration management
│       │   ├── constants.py     # ML libraries, patterns
│       │   ├── exceptions.py    # Exception hierarchy
│       │   └── types.py         # Type definitions
│       │
│       ├── scanner/             # Repository Scanning
│       │   ├── engine.py        # Scan orchestrator
│       │   ├── repository.py    # Repository abstraction
│       │   └── dependency_parser.py
│       │
│       ├── detection/           # AI Detection
│       │   ├── detector.py      # Detection orchestrator
│       │   ├── aggregator.py    # Signal aggregation
│       │   └── signals/         # Signal detectors
│       │       ├── base.py      # Signal protocol
│       │       ├── library.py   # Dependency detection
│       │       ├── model_file.py
│       │       ├── api_usage.py
│       │       └── llm.py
│       │
│       ├── sbom/                # SBOM Generation
│       │   ├── generator.py     # CycloneDX generator
│       │   ├── models.py        # AI-specific models
│       │   └── cyclonedx_adapter.py
│       │
│       ├── llm/                 # LLM Abstraction
│       │   ├── client.py        # Unified client
│       │   ├── providers/       # Provider implementations
│       │   │   ├── base.py
│       │   │   ├── anthropic.py
│       │   │   ├── openai.py
│       │   │   └── mock.py
│       │   └── prompts/
│       │
│       └── artifact/            # Artifact Management
│           ├── manager.py       # Read/write operations
│           ├── schema.py        # Pydantic models
│           └── validator.py
│
├── tests/
│   ├── conftest.py              # Shared fixtures
│   ├── unit/                    # Unit tests
│   └── integration/             # Integration tests
│
├── benchmark/
│   ├── repos.yaml               # Benchmark dataset
│   ├── cached/                  # Cloned repos (gitignored)
│   └── results/                 # Evaluation results
│
└── docs/                        # Documentation

Module Responsibilities

CLI Layer (cli/)

Purpose: User interface and command handling

┌──────────────────────────────────────────┐
│                  CLI                      │
│  ┌─────────┐  ┌─────────┐  ┌──────────┐  │
│  │  scan   │  │  init   │  │ validate │  │
│  └────┬────┘  └────┬────┘  └────┬─────┘  │
│       │            │            │        │
│       └────────────┼────────────┘        │
│                    │                     │
│              ┌─────▼─────┐               │
│              │  output   │               │
│              │ (Rich)    │               │
│              └───────────┘               │
└──────────────────────────────────────────┘

Components:

Core Layer (core/)

Purpose: Shared types, configuration, and constants

Components:

Scanner Module (scanner/)

Purpose: Repository access and dependency parsing

┌───────────────────────────────────────────┐
│              ScanEngine                    │
│  ┌─────────────────────────────────────┐  │
│  │           Repository                 │  │
│  │  ┌─────────┐  ┌─────────────────┐   │  │
│  │  │  Git    │  │  File System    │   │  │
│  │  │  Info   │  │  Operations     │   │  │
│  │  └─────────┘  └─────────────────┘   │  │
│  └─────────────────────────────────────┘  │
│                                           │
│  ┌─────────────────────────────────────┐  │
│  │        DependencyParser             │  │
│  │  requirements.txt │ pyproject.toml  │  │
│  │       setup.py    │                 │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘

Components:

Detection Module (detection/)

Purpose: AI system detection logic

┌────────────────────────────────────────────────────┐
│                    AIDetector                       │
│  ┌──────────────────────────────────────────────┐  │
│  │              Signal Detectors                 │  │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐     │  │
│  │  │ Library  │ │ Model    │ │ API      │ ... │  │
│  │  │ Signal   │ │ File     │ │ Usage    │     │  │
│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘     │  │
│  │       │            │            │           │  │
│  │       └────────────┼────────────┘           │  │
│  │                    │                        │  │
│  └──────────────────────────────────────────────┘  │
│                       │                            │
│                       ▼                            │
│  ┌──────────────────────────────────────────────┐  │
│  │            SignalAggregator                   │  │
│  │  Weighted combination → AI Systems            │  │
│  └──────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────┘

Components:

LLM Module (llm/)

Purpose: LLM provider abstraction

┌────────────────────────────────────────┐
│              LLMClient                  │
│                 │                      │
│    ┌────────────┼────────────┐        │
│    │            │            │        │
│    ▼            ▼            ▼        │
│ ┌──────┐   ┌──────┐   ┌──────┐      │
│ │Anthro│   │OpenAI│   │ Mock │      │
│ │pic   │   │      │   │      │      │
│ └──────┘   └──────┘   └──────┘      │
└────────────────────────────────────────┘

Components:

SBOM Module (sbom/)

Purpose: CycloneDX AI-SBoM generation

Components:

Artifact Module (artifact/)

Purpose: aigovhub.yaml management

Components:

Data Flow

Scan Flow

                         ┌─────────┐
                         │ CLI     │
                         │ (scan)  │
                         └────┬────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   ScanEngine    │
                    └────────┬────────┘
                             │
           ┌─────────────────┼─────────────────┐
           │                 │                 │
           ▼                 ▼                 ▼
    ┌────────────┐   ┌────────────┐   ┌────────────┐
    │ Repository │   │ Dependency │   │    Git     │
    │   Access   │   │   Parser   │   │    Info    │
    └──────┬─────┘   └──────┬─────┘   └──────┬─────┘
           │                │                │
           └────────────────┼────────────────┘
                            │
                            ▼
                   ┌─────────────────┐
                   │   AIDetector    │
                   │                 │
                   │  ┌───────────┐  │
                   │  │  Signals  │  │
                   │  └─────┬─────┘  │
                   │        │        │
                   │  ┌─────▼─────┐  │
                   │  │Aggregator │  │
                   │  └───────────┘  │
                   └────────┬────────┘
                            │
                            ▼
                    ┌───────────────┐
                    │  ScanResult   │
                    │  - ai_systems │
                    │  - signals    │
                    └───────┬───────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
              ▼             ▼             ▼
       ┌──────────┐  ┌──────────┐  ┌──────────┐
       │  YAML    │  │  JSON    │  │  Rich    │
       │  Output  │  │  Output  │  │  Output  │
       └──────────┘  └──────────┘  └──────────┘

Key Design Patterns

Protocol/Interface Pattern (Signals)

All signal detectors implement the same protocol:

class SignalDetector(ABC):
    @property
    @abstractmethod
    def name(self) -> str: ...
 
    @property
    @abstractmethod
    def priority(self) -> int: ...
 
    @abstractmethod
    def detect(
        self,
        repository: Repository,
        dependencies: list[Dependency],
    ) -> list[DetectionSignal]: ...

Provider Pattern (LLM)

LLM providers are interchangeable:

class LLMProvider(ABC):
    @abstractmethod
    def complete(
        self,
        prompt: str,
        *,
        system_prompt: str | None = None,
    ) -> LLMResponse: ...

Dataclass Pattern (Domain Types)

Core types use dataclasses for immutability:

@dataclass
class DetectionSignal:
    source: SignalSource
    confidence: Confidence
    ai_type: AISystemType
    evidence: list[str] = field(default_factory=list)

Settings Pattern (Configuration)

Pydantic settings for type-safe configuration:

class Config(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="AIGOVHUB_",
        env_file=".env",
    )
 
    confidence_threshold: float = 0.7

Extension Points

Adding a New Signal Detector

  1. Create a new module in detection/signals/:
# detection/signals/custom.py
from aigovhub.detection.signals.base import SignalDetector
 
class CustomSignal(SignalDetector):
    @property
    def name(self) -> str:
        return "custom"
 
    @property
    def priority(self) -> int:
        return 5
 
    def detect(self, repository, dependencies):
        # Detection logic
        return signals
  1. Register in detection/detector.py:
self.detectors: list[SignalDetector] = [
    LibrarySignal(),
    ModelFileSignal(),
    CustomSignal(),  # Add here
]

Adding a New LLM Provider

  1. Create a new module in llm/providers/:
# llm/providers/custom.py
from aigovhub.llm.providers.base import LLMProvider
 
class CustomProvider(LLMProvider):
    @property
    def name(self) -> str:
        return "custom"
 
    def complete(self, prompt, **kwargs):
        # Implementation
        return LLMResponse(...)
  1. Register in llm/client.py:
PROVIDERS: dict[str, type[LLMProvider]] = {
    "anthropic": AnthropicProvider,
    "openai": OpenAIProvider,
    "custom": CustomProvider,  # Add here
}

Testing Strategy

Unit Tests

Integration Tests

Test Fixtures

tests/
├── fixtures/
│   └── sample_repos/
│       ├── ml_project/      # Has ML dependencies
│       ├── llm_project/     # Has LLM integration
│       └── web_project/     # No AI (negative case)

Security Considerations

  1. Symlink Protection: Repository scanning validates that symlinks don't point outside the repository boundary, preventing directory traversal attacks
  2. Path Validation: Output file paths are validated to prevent path traversal (../) and symlink-based overwrites
  3. Error Sanitization: LLM provider errors are sanitized to prevent API key exposure in error messages
  4. Input Validation: LLM responses are validated (JSON parsing, type checking, confidence clamping) before use
  5. Specific Exceptions: CLI uses specific exception types rather than broad catches to avoid masking errors

Performance Considerations

  1. Lazy imports: Heavy dependencies loaded on demand
  2. File filtering: Exclude .git, node_modules, etc.
  3. Content caching: File contents read once
  4. Parallel potential: Signals could run in parallel (future)
  5. LLM batching: Code samples batched for LLM calls