Architecture

This document describes the technical architecture of AIGovHub CLI.

Overview

AIGovHub CLI is designed with these principles:

Modularity: Separate concerns into distinct modules
Extensibility: Easy to add new detection signals
Testability: All components are unit-testable
Type Safety: Full type hints throughout

Project Structure

aigovhub-cli/
├── src/
│   └── aigovhub/
│       ├── __init__.py          # Package entry, version
│       ├── __main__.py          # python -m aigovhub entry
│       ├── py.typed             # PEP 561 marker
│       │
│       ├── cli/                 # CLI Layer
│       │   ├── app.py           # Typer application
│       │   ├── output.py        # Rich formatters
│       │   └── commands/        # Command modules
│       │
│       ├── core/                # Shared Utilities
│       │   ├── config.py        # Configuration management
│       │   ├── constants.py     # ML libraries, patterns
│       │   ├── exceptions.py    # Exception hierarchy
│       │   └── types.py         # Type definitions
│       │
│       ├── scanner/             # Repository Scanning
│       │   ├── engine.py        # Scan orchestrator
│       │   ├── repository.py    # Repository abstraction
│       │   └── dependency_parser.py
│       │
│       ├── detection/           # AI Detection
│       │   ├── detector.py      # Detection orchestrator
│       │   ├── aggregator.py    # Signal aggregation
│       │   └── signals/         # Signal detectors
│       │       ├── base.py      # Signal protocol
│       │       ├── library.py   # Dependency detection
│       │       ├── model_file.py
│       │       ├── api_usage.py
│       │       └── llm.py
│       │
│       ├── sbom/                # SBOM Generation
│       │   ├── generator.py     # CycloneDX generator
│       │   ├── models.py        # AI-specific models
│       │   └── cyclonedx_adapter.py
│       │
│       ├── llm/                 # LLM Abstraction
│       │   ├── client.py        # Unified client
│       │   ├── providers/       # Provider implementations
│       │   │   ├── base.py
│       │   │   ├── anthropic.py
│       │   │   ├── openai.py
│       │   │   └── mock.py
│       │   └── prompts/
│       │
│       └── artifact/            # Artifact Management
│           ├── manager.py       # Read/write operations
│           ├── schema.py        # Pydantic models
│           └── validator.py
│
├── tests/
│   ├── conftest.py              # Shared fixtures
│   ├── unit/                    # Unit tests
│   └── integration/             # Integration tests
│
├── benchmark/
│   ├── repos.yaml               # Benchmark dataset
│   ├── cached/                  # Cloned repos (gitignored)
│   └── results/                 # Evaluation results
│
└── docs/                        # Documentation

Module Responsibilities

CLI Layer (`cli/`)

Purpose: User interface and command handling

┌──────────────────────────────────────────┐
│                  CLI                      │
│  ┌─────────┐  ┌─────────┐  ┌──────────┐  │
│  │  scan   │  │  init   │  │ validate │  │
│  └────┬────┘  └────┬────┘  └────┬─────┘  │
│       │            │            │        │
│       └────────────┼────────────┘        │
│                    │                     │
│              ┌─────▼─────┐               │
│              │  output   │               │
│              │ (Rich)    │               │
│              └───────────┘               │
└──────────────────────────────────────────┘

Components:

app.py: Typer application with command definitions
output.py: Rich console formatters, tables, panels
commands/: Individual command implementations (extensible)

Core Layer (`core/`)

Purpose: Shared types, configuration, and constants

Components:

types.py: Domain types (AISystem, DetectionSignal, Confidence)
exceptions.py: Exception hierarchy
config.py: Pydantic settings management
constants.py: ML libraries, file extensions, patterns

Scanner Module (`scanner/`)

Purpose: Repository access and dependency parsing

┌───────────────────────────────────────────┐
│              ScanEngine                    │
│  ┌─────────────────────────────────────┐  │
│  │           Repository                 │  │
│  │  ┌─────────┐  ┌─────────────────┐   │  │
│  │  │  Git    │  │  File System    │   │  │
│  │  │  Info   │  │  Operations     │   │  │
│  │  └─────────┘  └─────────────────┘   │  │
│  └─────────────────────────────────────┘  │
│                                           │
│  ┌─────────────────────────────────────┐  │
│  │        DependencyParser             │  │
│  │  requirements.txt │ pyproject.toml  │  │
│  │       setup.py    │                 │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘

Components:

engine.py: Main scan orchestrator
repository.py: Repository abstraction (file access, git info)
dependency_parser.py: Parse Python dependency files

Detection Module (`detection/`)

Purpose: AI system detection logic

┌────────────────────────────────────────────────────┐
│                    AIDetector                       │
│  ┌──────────────────────────────────────────────┐  │
│  │              Signal Detectors                 │  │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐     │  │
│  │  │ Library  │ │ Model    │ │ API      │ ... │  │
│  │  │ Signal   │ │ File     │ │ Usage    │     │  │
│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘     │  │
│  │       │            │            │           │  │
│  │       └────────────┼────────────┘           │  │
│  │                    │                        │  │
│  └──────────────────────────────────────────────┘  │
│                       │                            │
│                       ▼                            │
│  ┌──────────────────────────────────────────────┐  │
│  │            SignalAggregator                   │  │
│  │  Weighted combination → AI Systems            │  │
│  └──────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────┘

Components:

detector.py: Detection orchestrator
aggregator.py: Signal aggregation and AI system creation
signals/base.py: Signal detector protocol
signals/library.py: Dependency-based detection
signals/model_file.py: Model file detection
signals/api_usage.py: Code pattern detection
signals/llm.py: LLM-based analysis

LLM Module (`llm/`)

Purpose: LLM provider abstraction

┌────────────────────────────────────────┐
│              LLMClient                  │
│                 │                      │
│    ┌────────────┼────────────┐        │
│    │            │            │        │
│    ▼            ▼            ▼        │
│ ┌──────┐   ┌──────┐   ┌──────┐      │
│ │Anthro│   │OpenAI│   │ Mock │      │
│ │pic   │   │      │   │      │      │
│ └──────┘   └──────┘   └──────┘      │
└────────────────────────────────────────┘

Components:

client.py: Unified LLM client
providers/base.py: Provider protocol
providers/anthropic.py: Claude implementation
providers/openai.py: GPT implementation
providers/mock.py: Testing mock
prompts/: Prompt templates

SBOM Module (`sbom/`)

Purpose: CycloneDX AI-SBoM generation

Components:

generator.py: SBOM generation logic
models.py: AI-specific component models
cyclonedx_adapter.py: CycloneDX library integration

Artifact Module (`artifact/`)

Purpose: aigovhub.yaml management

Components:

manager.py: Read/write/update operations
schema.py: Pydantic models for validation
validator.py: Schema and semantic validation

Data Flow

Scan Flow

                         ┌─────────┐
                         │ CLI     │
                         │ (scan)  │
                         └────┬────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   ScanEngine    │
                    └────────┬────────┘
                             │
           ┌─────────────────┼─────────────────┐
           │                 │                 │
           ▼                 ▼                 ▼
    ┌────────────┐   ┌────────────┐   ┌────────────┐
    │ Repository │   │ Dependency │   │    Git     │
    │   Access   │   │   Parser   │   │    Info    │
    └──────┬─────┘   └──────┬─────┘   └──────┬─────┘
           │                │                │
           └────────────────┼────────────────┘
                            │
                            ▼
                   ┌─────────────────┐
                   │   AIDetector    │
                   │                 │
                   │  ┌───────────┐  │
                   │  │  Signals  │  │
                   │  └─────┬─────┘  │
                   │        │        │
                   │  ┌─────▼─────┐  │
                   │  │Aggregator │  │
                   │  └───────────┘  │
                   └────────┬────────┘
                            │
                            ▼
                    ┌───────────────┐
                    │  ScanResult   │
                    │  - ai_systems │
                    │  - signals    │
                    └───────┬───────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
              ▼             ▼             ▼
       ┌──────────┐  ┌──────────┐  ┌──────────┐
       │  YAML    │  │  JSON    │  │  Rich    │
       │  Output  │  │  Output  │  │  Output  │
       └──────────┘  └──────────┘  └──────────┘

Key Design Patterns

Protocol/Interface Pattern (Signals)

All signal detectors implement the same protocol:

class SignalDetector(ABC):
    @property
    @abstractmethod
    def name(self) -> str: ...
 
    @property
    @abstractmethod
    def priority(self) -> int: ...
 
    @abstractmethod
    def detect(
        self,
        repository: Repository,
        dependencies: list[Dependency],
    ) -> list[DetectionSignal]: ...

Provider Pattern (LLM)

LLM providers are interchangeable:

class LLMProvider(ABC):
    @abstractmethod
    def complete(
        self,
        prompt: str,
        *,
        system_prompt: str | None = None,
    ) -> LLMResponse: ...

Dataclass Pattern (Domain Types)

Core types use dataclasses for immutability:

@dataclass
class DetectionSignal:
    source: SignalSource
    confidence: Confidence
    ai_type: AISystemType
    evidence: list[str] = field(default_factory=list)

Settings Pattern (Configuration)

Pydantic settings for type-safe configuration:

class Config(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="AIGOVHUB_",
        env_file=".env",
    )
 
    confidence_threshold: float = 0.7

Extension Points

Adding a New Signal Detector

Create a new module in detection/signals/:

# detection/signals/custom.py
from aigovhub.detection.signals.base import SignalDetector
 
class CustomSignal(SignalDetector):
    @property
    def name(self) -> str:
        return "custom"
 
    @property
    def priority(self) -> int:
        return 5
 
    def detect(self, repository, dependencies):
        # Detection logic
        return signals

self.detectors: list[SignalDetector] = [
    LibrarySignal(),
    ModelFileSignal(),
    CustomSignal(),  # Add here
]

Adding a New LLM Provider

Create a new module in llm/providers/:

# llm/providers/custom.py
from aigovhub.llm.providers.base import LLMProvider
 
class CustomProvider(LLMProvider):
    @property
    def name(self) -> str:
        return "custom"
 
    def complete(self, prompt, **kwargs):
        # Implementation
        return LLMResponse(...)

PROVIDERS: dict[str, type[LLMProvider]] = {
    "anthropic": AnthropicProvider,
    "openai": OpenAIProvider,
    "custom": CustomProvider,  # Add here
}

Testing Strategy

Unit Tests

Test each signal detector independently
Test aggregation logic with mock signals
Test dependency parsing with fixture files

Integration Tests

Test full scan workflow
Test CLI commands
Test with sample repositories

Test Fixtures

tests/
├── fixtures/
│   └── sample_repos/
│       ├── ml_project/      # Has ML dependencies
│       ├── llm_project/     # Has LLM integration
│       └── web_project/     # No AI (negative case)

Security Considerations

Symlink Protection: Repository scanning validates that symlinks don't point outside the repository boundary, preventing directory traversal attacks
Path Validation: Output file paths are validated to prevent path traversal (../) and symlink-based overwrites
Error Sanitization: LLM provider errors are sanitized to prevent API key exposure in error messages
Input Validation: LLM responses are validated (JSON parsing, type checking, confidence clamping) before use
Specific Exceptions: CLI uses specific exception types rather than broad catches to avoid masking errors

Performance Considerations

Lazy imports: Heavy dependencies loaded on demand
File filtering: Exclude .git, node_modules, etc.
Content caching: File contents read once
Parallel potential: Signals could run in parallel (future)
LLM batching: Code samples batched for LLM calls