Architecture
aigovhub-cli
Architecture
This document describes the technical architecture of AIGovHub CLI.
Overview
AIGovHub CLI is designed with these principles:
- Modularity: Separate concerns into distinct modules
- Extensibility: Easy to add new detection signals
- Testability: All components are unit-testable
- Type Safety: Full type hints throughout
Project Structure
aigovhub-cli/
├── src/
│ └── aigovhub/
│ ├── __init__.py # Package entry, version
│ ├── __main__.py # python -m aigovhub entry
│ ├── py.typed # PEP 561 marker
│ │
│ ├── cli/ # CLI Layer
│ │ ├── app.py # Typer application
│ │ ├── output.py # Rich formatters
│ │ └── commands/ # Command modules
│ │
│ ├── core/ # Shared Utilities
│ │ ├── config.py # Configuration management
│ │ ├── constants.py # ML libraries, patterns
│ │ ├── exceptions.py # Exception hierarchy
│ │ └── types.py # Type definitions
│ │
│ ├── scanner/ # Repository Scanning
│ │ ├── engine.py # Scan orchestrator
│ │ ├── repository.py # Repository abstraction
│ │ └── dependency_parser.py
│ │
│ ├── detection/ # AI Detection
│ │ ├── detector.py # Detection orchestrator
│ │ ├── aggregator.py # Signal aggregation
│ │ └── signals/ # Signal detectors
│ │ ├── base.py # Signal protocol
│ │ ├── library.py # Dependency detection
│ │ ├── model_file.py
│ │ ├── api_usage.py
│ │ └── llm.py
│ │
│ ├── sbom/ # SBOM Generation
│ │ ├── generator.py # CycloneDX generator
│ │ ├── models.py # AI-specific models
│ │ └── cyclonedx_adapter.py
│ │
│ ├── llm/ # LLM Abstraction
│ │ ├── client.py # Unified client
│ │ ├── providers/ # Provider implementations
│ │ │ ├── base.py
│ │ │ ├── anthropic.py
│ │ │ ├── openai.py
│ │ │ └── mock.py
│ │ └── prompts/
│ │
│ └── artifact/ # Artifact Management
│ ├── manager.py # Read/write operations
│ ├── schema.py # Pydantic models
│ └── validator.py
│
├── tests/
│ ├── conftest.py # Shared fixtures
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
│
├── benchmark/
│ ├── repos.yaml # Benchmark dataset
│ ├── cached/ # Cloned repos (gitignored)
│ └── results/ # Evaluation results
│
└── docs/ # Documentation
Module Responsibilities
CLI Layer (cli/)
Purpose: User interface and command handling
┌──────────────────────────────────────────┐
│ CLI │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │ scan │ │ init │ │ validate │ │
│ └────┬────┘ └────┬────┘ └────┬─────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ output │ │
│ │ (Rich) │ │
│ └───────────┘ │
└──────────────────────────────────────────┘
Components:
app.py: Typer application with command definitionsoutput.py: Rich console formatters, tables, panelscommands/: Individual command implementations (extensible)
Core Layer (core/)
Purpose: Shared types, configuration, and constants
Components:
types.py: Domain types (AISystem,DetectionSignal,Confidence)exceptions.py: Exception hierarchyconfig.py: Pydantic settings managementconstants.py: ML libraries, file extensions, patterns
Scanner Module (scanner/)
Purpose: Repository access and dependency parsing
┌───────────────────────────────────────────┐
│ ScanEngine │
│ ┌─────────────────────────────────────┐ │
│ │ Repository │ │
│ │ ┌─────────┐ ┌─────────────────┐ │ │
│ │ │ Git │ │ File System │ │ │
│ │ │ Info │ │ Operations │ │ │
│ │ └─────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ DependencyParser │ │
│ │ requirements.txt │ pyproject.toml │ │
│ │ setup.py │ │ │
│ └─────────────────────────────────────┘ │
└───────────────────────────────────────────┘
Components:
engine.py: Main scan orchestratorrepository.py: Repository abstraction (file access, git info)dependency_parser.py: Parse Python dependency files
Detection Module (detection/)
Purpose: AI system detection logic
┌────────────────────────────────────────────────────┐
│ AIDetector │
│ ┌──────────────────────────────────────────────┐ │
│ │ Signal Detectors │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Library │ │ Model │ │ API │ ... │ │
│ │ │ Signal │ │ File │ │ Usage │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────┼────────────┘ │ │
│ │ │ │ │
│ └──────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ SignalAggregator │ │
│ │ Weighted combination → AI Systems │ │
│ └──────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────┘
Components:
detector.py: Detection orchestratoraggregator.py: Signal aggregation and AI system creationsignals/base.py: Signal detector protocolsignals/library.py: Dependency-based detectionsignals/model_file.py: Model file detectionsignals/api_usage.py: Code pattern detectionsignals/llm.py: LLM-based analysis
LLM Module (llm/)
Purpose: LLM provider abstraction
┌────────────────────────────────────────┐
│ LLMClient │
│ │ │
│ ┌────────────┼────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Anthro│ │OpenAI│ │ Mock │ │
│ │pic │ │ │ │ │ │
│ └──────┘ └──────┘ └──────┘ │
└────────────────────────────────────────┘
Components:
client.py: Unified LLM clientproviders/base.py: Provider protocolproviders/anthropic.py: Claude implementationproviders/openai.py: GPT implementationproviders/mock.py: Testing mockprompts/: Prompt templates
SBOM Module (sbom/)
Purpose: CycloneDX AI-SBoM generation
Components:
generator.py: SBOM generation logicmodels.py: AI-specific component modelscyclonedx_adapter.py: CycloneDX library integration
Artifact Module (artifact/)
Purpose: aigovhub.yaml management
Components:
manager.py: Read/write/update operationsschema.py: Pydantic models for validationvalidator.py: Schema and semantic validation
Data Flow
Scan Flow
┌─────────┐
│ CLI │
│ (scan) │
└────┬────┘
│
▼
┌─────────────────┐
│ ScanEngine │
└────────┬────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Repository │ │ Dependency │ │ Git │
│ Access │ │ Parser │ │ Info │
└──────┬─────┘ └──────┬─────┘ └──────┬─────┘
│ │ │
└────────────────┼────────────────┘
│
▼
┌─────────────────┐
│ AIDetector │
│ │
│ ┌───────────┐ │
│ │ Signals │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │Aggregator │ │
│ └───────────┘ │
└────────┬────────┘
│
▼
┌───────────────┐
│ ScanResult │
│ - ai_systems │
│ - signals │
└───────┬───────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ YAML │ │ JSON │ │ Rich │
│ Output │ │ Output │ │ Output │
└──────────┘ └──────────┘ └──────────┘
Key Design Patterns
Protocol/Interface Pattern (Signals)
All signal detectors implement the same protocol:
class SignalDetector(ABC):
@property
@abstractmethod
def name(self) -> str: ...
@property
@abstractmethod
def priority(self) -> int: ...
@abstractmethod
def detect(
self,
repository: Repository,
dependencies: list[Dependency],
) -> list[DetectionSignal]: ...Provider Pattern (LLM)
LLM providers are interchangeable:
class LLMProvider(ABC):
@abstractmethod
def complete(
self,
prompt: str,
*,
system_prompt: str | None = None,
) -> LLMResponse: ...Dataclass Pattern (Domain Types)
Core types use dataclasses for immutability:
@dataclass
class DetectionSignal:
source: SignalSource
confidence: Confidence
ai_type: AISystemType
evidence: list[str] = field(default_factory=list)Settings Pattern (Configuration)
Pydantic settings for type-safe configuration:
class Config(BaseSettings):
model_config = SettingsConfigDict(
env_prefix="AIGOVHUB_",
env_file=".env",
)
confidence_threshold: float = 0.7Extension Points
Adding a New Signal Detector
- Create a new module in
detection/signals/:
# detection/signals/custom.py
from aigovhub.detection.signals.base import SignalDetector
class CustomSignal(SignalDetector):
@property
def name(self) -> str:
return "custom"
@property
def priority(self) -> int:
return 5
def detect(self, repository, dependencies):
# Detection logic
return signals- Register in
detection/detector.py:
self.detectors: list[SignalDetector] = [
LibrarySignal(),
ModelFileSignal(),
CustomSignal(), # Add here
]Adding a New LLM Provider
- Create a new module in
llm/providers/:
# llm/providers/custom.py
from aigovhub.llm.providers.base import LLMProvider
class CustomProvider(LLMProvider):
@property
def name(self) -> str:
return "custom"
def complete(self, prompt, **kwargs):
# Implementation
return LLMResponse(...)- Register in
llm/client.py:
PROVIDERS: dict[str, type[LLMProvider]] = {
"anthropic": AnthropicProvider,
"openai": OpenAIProvider,
"custom": CustomProvider, # Add here
}Testing Strategy
Unit Tests
- Test each signal detector independently
- Test aggregation logic with mock signals
- Test dependency parsing with fixture files
Integration Tests
- Test full scan workflow
- Test CLI commands
- Test with sample repositories
Test Fixtures
tests/
├── fixtures/
│ └── sample_repos/
│ ├── ml_project/ # Has ML dependencies
│ ├── llm_project/ # Has LLM integration
│ └── web_project/ # No AI (negative case)
Security Considerations
- Symlink Protection: Repository scanning validates that symlinks don't point outside the repository boundary, preventing directory traversal attacks
- Path Validation: Output file paths are validated to prevent path traversal (
../) and symlink-based overwrites - Error Sanitization: LLM provider errors are sanitized to prevent API key exposure in error messages
- Input Validation: LLM responses are validated (JSON parsing, type checking, confidence clamping) before use
- Specific Exceptions: CLI uses specific exception types rather than broad catches to avoid masking errors
Performance Considerations
- Lazy imports: Heavy dependencies loaded on demand
- File filtering: Exclude
.git,node_modules, etc. - Content caching: File contents read once
- Parallel potential: Signals could run in parallel (future)
- LLM batching: Code samples batched for LLM calls