CodeDocs Vault

Hermes Agent - Security Architecture

Threat Model

Hermes is a single-tenant personal agent with one trusted operator. The security model protects the operator from unintended LLM-driven actions, not from malicious co-tenants. Multi-user isolation relies on OS/host-level separation.

Reference: SECURITY.md (85 lines)

Security Layers

                    SECURITY BOUNDARY STACK

    ┌─────────────────────────────────────────────────┐
    │ Layer 1: COMMAND APPROVAL (tools/approval.py)   │
    │ 37 dangerous patterns, user confirmation        │
    ├─────────────────────────────────────────────────┤
    │ Layer 2: PATH SECURITY (tools/path_security.py) │
    │ Symlink-aware traversal detection               │
    ├─────────────────────────────────────────────────┤
    │ Layer 3: FILE PROTECTION (tools/file_tools.py)  │
    │ Device blocking, sensitive paths, read limits   │
    ├─────────────────────────────────────────────────┤
    │ Layer 4: CREDENTIAL PROTECTION                  │
    │ (tools/credential_files.py)                     │
    │ Env filtering, credential mounting              │
    ├─────────────────────────────────────────────────┤
    │ Layer 5: URL SAFETY (tools/url_safety.py)       │
    │ SSRF prevention, private IP blocking            │
    ├─────────────────────────────────────────────────┤
    │ Layer 6: CONTENT SCANNING (tirith_security.py)  │
    │ Homograph URLs, injection, obfuscation          │
    ├─────────────────────────────────────────────────┤
    │ Layer 7: MEMORY INJECTION DEFENSE               │
    │ (tools/memory_tool.py)                          │
    │ Pattern matching for prompt injection            │
    ├─────────────────────────────────────────────────┤
    │ Layer 8: SKILL GUARD (tools/skills_guard.py)    │
    │ Security scan on skill install/create/edit      │
    ├─────────────────────────────────────────────────┤
    │ Layer 9: DELEGATION ISOLATION                   │
    │ (tools/delegate_tool.py)                        │
    │ Tool blocklists, depth limits, memory isolation │
    ├─────────────────────────────────────────────────┤
    │ Layer 10: EXECUTION SANDBOXING                  │
    │ (tools/environments/docker.py, modal.py, etc.)  │
    │ Container isolation, resource limits            │
    └─────────────────────────────────────────────────┘

Layer 1: Dangerous Command Approval (tools/approval.py)

Detection Patterns (lines 75-138)

37 dangerous patterns organized by category:

Category Examples
Recursive delete rm -r, rm --recursive, rm -rf
Filesystem ops mkfs, dd, chmod 777, chown
SQL destructive DROP TABLE, DELETE (without WHERE), TRUNCATE
Shell injection Pipe to sh|bash, -c flag execution
Git destructive reset --hard, push --force, push -f
System service systemctl stop, systemctl restart
Package management apt remove, pip uninstall
Network iptables, ufw
Self-modification hermes update, gateway run (outside systemd)

Approval Modes

Mode Behavior Use Case
"on" Prompt user for confirmation Default, recommended
"auto" Auto-approve after configurable delay Unattended automation
"off" Disable approval entirely Break-glass only

State Management (lines 204-256)

# Per-session approval state (thread-safe, keyed by session_key)
class ApprovalState:
    permanent_allowlist: Set[str]     # Persisted in config.yaml
    session_approvals: Set[str]       # This session only
    pending_approval: Optional[Event] # Blocking queue (gateway async)

Approval Keys

# Canonical key (human-readable):
"recursive delete"
 
# Legacy key (regex-derived, backward compat):
"rm_recursive"
 
# Multiple aliases can match the same pattern for config migration

Layer 2: Path Security (tools/path_security.py)

def validate_within_dir(path: str, root: str) -> bool:
    """Ensure resolved path doesn't escape root directory. Symlink-aware."""
    
def has_traversal_component(path_str: str) -> bool:
    """Detect '..' path components."""

Used by: Skills Guard, credential file registration, cronjob tools.

Layer 3: File Protection (tools/file_tools.py)

Device Blocking (lines 62-90)

BLOCKED_DEVICE_PATHS = [
    "/dev/zero", "/dev/random", "/dev/urandom",
    "/dev/stdin", "/dev/stdout", "/dev/stderr",
    "/dev/tty", "/dev/null"  # prevent hangs
]
# Checks literal paths (no symlink following to defeat checks)

Sensitive Path Protection (lines 94-118)

SENSITIVE_WRITE_PATHS = [
    "/etc/", "/boot/", "/usr/lib/systemd/",
    "/var/run/docker.sock"  # Docker socket
]
# Blocks writes; reads allowed. Requires terminal approval to bypass.

Read Size Guards (lines 18-28)

External Modification Detection

Layer 4: Credential Protection (tools/credential_files.py)

Environment Variable Filtering

# tools/environments/local.py
_HERMES_PROVIDER_ENV_BLOCKLIST = [
    "OPENROUTER_API_KEY", "ANTHROPIC_API_KEY", "OPENAI_API_KEY",
    "GOOGLE_API_KEY", "GITHUB_TOKEN", "SLACK_BOT_TOKEN",
    # ... all provider credentials
]

API keys/tokens are stripped from subprocess environments. Only explicitly declared env vars are passed through (via skills or config).

Credential File Registry

# Session-scoped (ContextVar-backed) for cross-session isolation
def register_credential_file(relative_path: str):
    """
    1. Validates path (no absolute, no .., no traversal)
    2. Resolves to HERMES_HOME/relative_path
    3. Stores for remote sandbox mounting
    """

Flow:

  1. Skills declare required_credential_files (relative to HERMES_HOME)
  2. Remote backends query registry at sandbox creation + pre-command
  3. Files mounted read-only where possible

Layer 5: URL Safety (tools/url_safety.py)

SSRF Prevention

BLOCKED_IP_RANGES = [
    "10.0.0.0/8",         # Private
    "172.16.0.0/12",      # Private
    "192.168.0.0/16",     # Private
    "127.0.0.0/8",        # Loopback
    "169.254.0.0/16",     # Link-local
    "100.64.0.0/10",      # CGNAT (RFC 6598)
    "224.0.0.0/4",        # Multicast
    "0.0.0.0/8",          # Unspecified
]
 
BLOCKED_HOSTNAMES = [
    "metadata.google.internal",
    "metadata.goog",
]

Fail-closed: DNS resolution errors block the request (prevent DNS rebinding TOCTOU).

Documented Limitations

Layer 6: Content Scanning (tools/tirith_security.py)

Tirith Binary

External security scanner with automatic installation:

# Auto-install from GitHub releases
# SHA-256 checksum verification
# Optional cosign provenance verification
# Disk-persistent failure markers (24-hour TTL)

Detection Categories

Configuration

security:
  tirith_enabled: true        # Default: on
  tirith_path: "tirith"       # Binary location
  tirith_timeout: 5           # Seconds
  tirith_fail_open: true      # Allow if scanner unavailable

Layer 7: Memory Injection Defense (tools/memory_tool.py:65-81)

Pattern Matching

_MEMORY_THREAT_PATTERNS = [
    # Prompt injection
    r"ignore previous instructions",
    r"you are now",
    r"disregard all",
    r"forget everything",
    
    # Exfiltration
    r"curl.*secret", r"wget.*password",
    r"cat ~/.ssh", r"cat ~/.env",
    
    # Invisible unicode
    r"[\u200b\u200c\u200d\u2060\ufeff]",  # zero-width chars
    r"[\u202a-\u202e]",                    # bidi override
]

Memory writes are scanned before acceptance. Blocked entries return an error, preventing persistence.

Layer 8: Skill Guard (tools/skills_guard.py)

Skills from the Skills Hub and agent-created skills are security-scanned:

Layer 9: Delegation Isolation (tools/delegate_tool.py)

DELEGATE_BLOCKED_TOOLS = [
    "delegate_task",    # No recursive delegation
    "clarify",          # No user interaction
    "memory",           # No memory access
    "send_message",     # No messaging
    "execute_code",     # No code execution sandbox
]
 
MAX_DEPTH = 2  # parent → child → rejected

Each subagent gets:

Layer 10: Execution Sandboxing

Docker Backend (tools/environments/docker.py)

docker_run_args = [
    "--cap-drop=ALL",           # Drop all capabilities
    "--security-opt=no-new-privileges",
    "--pids-limit=256",         # Fork bomb protection
    "--memory=512m",            # Memory limit
    "--cpu-shares=512",         # CPU throttling
    "--network=bridge",         # Network isolation
    "--read-only",              # Read-only root filesystem (optional)
]

Code Execution Sandbox (tools/code_execution_tool.py)

# API keys stripped from child environment
# Only explicitly declared env vars passed through
# Timeout: 300s (5 min)
# Max tool calls: 50
# Max stdout: 50 KB, stderr: 10 KB

Security Boundary Summary

Boundary What It Protects Implementation
Dangerous commands Host system from destructive ops approval.py (37 patterns)
Sensitive paths System files from writes file_tools.py (path blocklist)
Subprocess env API keys from leaking local.py (env blocklist)
Code execution Agent from untrusted scripts code_execution_tool.py (sandbox)
Subagent privilege Parent from child escalation delegate_tool.py (tool blocklist)
Remote credentials Secrets in remote sandboxes credential_files.py (mount isolation)
MCP servers Supply chain attacks mcp_tool.py (OSV check + env filtering)
SSRF Internal network from external URLs url_safety.py (IP range blocking)
Prompt injection Memory from poisoning memory_tool.py (pattern matching)
Skill content System from malicious skills skills_guard.py (security scan)

MCP Server Security (tools/mcp_tool.py)

Package Verification

# npx/uvx packages checked against OSV database before spawning
# Supply-chain audit: GitHub Actions pinned to commit SHAs

Environment Isolation

def _build_safe_env():
    """Only safe baseline variables + declared env vars.
    Credential stripping in error messages."""

Sampling Protection

# MCP servers can request LLM completions via sampling/createMessage
# Protected by: model allowlist, max_tokens_cap, max_rpm, max_tool_rounds