Hermes Agent - Security Architecture

Threat Model

Hermes is a single-tenant personal agent with one trusted operator. The security model protects the operator from unintended LLM-driven actions, not from malicious co-tenants. Multi-user isolation relies on OS/host-level separation.

Reference: SECURITY.md (85 lines)

Security Layers

                    SECURITY BOUNDARY STACK

    ┌─────────────────────────────────────────────────┐
    │ Layer 1: COMMAND APPROVAL (tools/approval.py)   │
    │ 37 dangerous patterns, user confirmation        │
    ├─────────────────────────────────────────────────┤
    │ Layer 2: PATH SECURITY (tools/path_security.py) │
    │ Symlink-aware traversal detection               │
    ├─────────────────────────────────────────────────┤
    │ Layer 3: FILE PROTECTION (tools/file_tools.py)  │
    │ Device blocking, sensitive paths, read limits   │
    ├─────────────────────────────────────────────────┤
    │ Layer 4: CREDENTIAL PROTECTION                  │
    │ (tools/credential_files.py)                     │
    │ Env filtering, credential mounting              │
    ├─────────────────────────────────────────────────┤
    │ Layer 5: URL SAFETY (tools/url_safety.py)       │
    │ SSRF prevention, private IP blocking            │
    ├─────────────────────────────────────────────────┤
    │ Layer 6: CONTENT SCANNING (tirith_security.py)  │
    │ Homograph URLs, injection, obfuscation          │
    ├─────────────────────────────────────────────────┤
    │ Layer 7: MEMORY INJECTION DEFENSE               │
    │ (tools/memory_tool.py)                          │
    │ Pattern matching for prompt injection            │
    ├─────────────────────────────────────────────────┤
    │ Layer 8: SKILL GUARD (tools/skills_guard.py)    │
    │ Security scan on skill install/create/edit      │
    ├─────────────────────────────────────────────────┤
    │ Layer 9: DELEGATION ISOLATION                   │
    │ (tools/delegate_tool.py)                        │
    │ Tool blocklists, depth limits, memory isolation │
    ├─────────────────────────────────────────────────┤
    │ Layer 10: EXECUTION SANDBOXING                  │
    │ (tools/environments/docker.py, modal.py, etc.)  │
    │ Container isolation, resource limits            │
    └─────────────────────────────────────────────────┘

Layer 1: Dangerous Command Approval (`tools/approval.py`)

Detection Patterns (lines 75-138)

37 dangerous patterns organized by category:

Category	Examples
Recursive delete	`rm -r`, `rm --recursive`, `rm -rf`
Filesystem ops	`mkfs`, `dd`, `chmod 777`, `chown`
SQL destructive	`DROP TABLE`, `DELETE` (without WHERE), `TRUNCATE`
Shell injection	Pipe to `sh\|bash`, `-c` flag execution
Git destructive	`reset --hard`, `push --force`, `push -f`
System service	`systemctl stop`, `systemctl restart`
Package management	`apt remove`, `pip uninstall`
Network	`iptables`, `ufw`
Self-modification	`hermes update`, `gateway run` (outside systemd)

Approval Modes

Mode	Behavior	Use Case
`"on"`	Prompt user for confirmation	Default, recommended
`"auto"`	Auto-approve after configurable delay	Unattended automation
`"off"`	Disable approval entirely	Break-glass only

State Management (lines 204-256)

# Per-session approval state (thread-safe, keyed by session_key)
class ApprovalState:
    permanent_allowlist: Set[str]     # Persisted in config.yaml
    session_approvals: Set[str]       # This session only
    pending_approval: Optional[Event] # Blocking queue (gateway async)

Approval Keys

# Canonical key (human-readable):
"recursive delete"
 
# Legacy key (regex-derived, backward compat):
"rm_recursive"
 
# Multiple aliases can match the same pattern for config migration

Layer 2: Path Security (`tools/path_security.py`)

def validate_within_dir(path: str, root: str) -> bool:
    """Ensure resolved path doesn't escape root directory. Symlink-aware."""
    
def has_traversal_component(path_str: str) -> bool:
    """Detect '..' path components."""

Used by: Skills Guard, credential file registration, cronjob tools.

Layer 3: File Protection (`tools/file_tools.py`)

Device Blocking (lines 62-90)

BLOCKED_DEVICE_PATHS = [
    "/dev/zero", "/dev/random", "/dev/urandom",
    "/dev/stdin", "/dev/stdout", "/dev/stderr",
    "/dev/tty", "/dev/null"  # prevent hangs
]
# Checks literal paths (no symlink following to defeat checks)

Sensitive Path Protection (lines 94-118)

SENSITIVE_WRITE_PATHS = [
    "/etc/", "/boot/", "/usr/lib/systemd/",
    "/var/run/docker.sock"  # Docker socket
]
# Blocks writes; reads allowed. Requires terminal approval to bypass.

Read Size Guards (lines 18-28)

Default: 100,000 chars max per read (~25-35K tokens)
Configurable via config.yaml: file_read_max_chars
Encourages offset + limit for large files

External Modification Detection

Thread-safe tracking of files read/written per task_id
Detects re-read loops
Warns when file changed between agent's read and write

Layer 4: Credential Protection (`tools/credential_files.py`)

Environment Variable Filtering

# tools/environments/local.py
_HERMES_PROVIDER_ENV_BLOCKLIST = [
    "OPENROUTER_API_KEY", "ANTHROPIC_API_KEY", "OPENAI_API_KEY",
    "GOOGLE_API_KEY", "GITHUB_TOKEN", "SLACK_BOT_TOKEN",
    # ... all provider credentials
]

API keys/tokens are stripped from subprocess environments. Only explicitly declared env vars are passed through (via skills or config).

Credential File Registry

# Session-scoped (ContextVar-backed) for cross-session isolation
def register_credential_file(relative_path: str):
    """
    1. Validates path (no absolute, no .., no traversal)
    2. Resolves to HERMES_HOME/relative_path
    3. Stores for remote sandbox mounting
    """

Flow:

Skills declare required_credential_files (relative to HERMES_HOME)
Remote backends query registry at sandbox creation + pre-command
Files mounted read-only where possible

Layer 5: URL Safety (`tools/url_safety.py`)

SSRF Prevention

BLOCKED_IP_RANGES = [
    "10.0.0.0/8",         # Private
    "172.16.0.0/12",      # Private
    "192.168.0.0/16",     # Private
    "127.0.0.0/8",        # Loopback
    "169.254.0.0/16",     # Link-local
    "100.64.0.0/10",      # CGNAT (RFC 6598)
    "224.0.0.0/4",        # Multicast
    "0.0.0.0/8",          # Unspecified
]
 
BLOCKED_HOSTNAMES = [
    "metadata.google.internal",
    "metadata.goog",
]

Fail-closed: DNS resolution errors block the request (prevent DNS rebinding TOCTOU).

Documented Limitations

DNS rebinding: attacker-controlled DNS with TTL=0 can bypass TOCTOU check
Redirect bypass: mitigated by redirect validation in vision_tools and gateway adapters
Third-party web tools (Firecrawl/Tavily): redirect handling on their servers

Layer 6: Content Scanning (`tools/tirith_security.py`)

Tirith Binary

External security scanner with automatic installation:

# Auto-install from GitHub releases
# SHA-256 checksum verification
# Optional cosign provenance verification
# Disk-persistent failure markers (24-hour TTL)

Detection Categories

Homograph URLs (unicode lookalike characters)
Pipe-to-interpreter patterns (curl | bash)
Terminal injection attempts (ANSI escape sequences)
Command obfuscation (Unicode normalization attacks)

Configuration

security:
  tirith_enabled: true        # Default: on
  tirith_path: "tirith"       # Binary location
  tirith_timeout: 5           # Seconds
  tirith_fail_open: true      # Allow if scanner unavailable

Layer 7: Memory Injection Defense (`tools/memory_tool.py:65-81`)

Pattern Matching

_MEMORY_THREAT_PATTERNS = [
    # Prompt injection
    r"ignore previous instructions",
    r"you are now",
    r"disregard all",
    r"forget everything",
    
    # Exfiltration
    r"curl.*secret", r"wget.*password",
    r"cat ~/.ssh", r"cat ~/.env",
    
    # Invisible unicode
    r"[\u200b\u200c\u200d\u2060\ufeff]",  # zero-width chars
    r"[\u202a-\u202e]",                    # bidi override
]

Memory writes are scanned before acceptance. Blocked entries return an error, preventing persistence.

Layer 8: Skill Guard (`tools/skills_guard.py`)

Skills from the Skills Hub and agent-created skills are security-scanned:

Content scanning for injection/exfiltration patterns
Trusted repos list (built-in + configured)
Quarantine directory for suspicious skills pending review
Rollback on security scan failure during create/edit

Layer 9: Delegation Isolation (`tools/delegate_tool.py`)

DELEGATE_BLOCKED_TOOLS = [
    "delegate_task",    # No recursive delegation
    "clarify",          # No user interaction
    "memory",           # No memory access
    "send_message",     # No messaging
    "execute_code",     # No code execution sandbox
]
 
MAX_DEPTH = 2  # parent → child → rejected

Each subagent gets:

Fresh conversation (no parent history)
Own task_id (isolated terminal session, file ops cache)
skip_memory=True (no memory reads or writes)
Shared iteration budget (prevents runaway delegation)

Layer 10: Execution Sandboxing

Docker Backend (`tools/environments/docker.py`)

docker_run_args = [
    "--cap-drop=ALL",           # Drop all capabilities
    "--security-opt=no-new-privileges",
    "--pids-limit=256",         # Fork bomb protection
    "--memory=512m",            # Memory limit
    "--cpu-shares=512",         # CPU throttling
    "--network=bridge",         # Network isolation
    "--read-only",              # Read-only root filesystem (optional)
]

Code Execution Sandbox (`tools/code_execution_tool.py`)

# API keys stripped from child environment
# Only explicitly declared env vars passed through
# Timeout: 300s (5 min)
# Max tool calls: 50
# Max stdout: 50 KB, stderr: 10 KB

Security Boundary Summary

Boundary	What It Protects	Implementation
Dangerous commands	Host system from destructive ops	`approval.py` (37 patterns)
Sensitive paths	System files from writes	`file_tools.py` (path blocklist)
Subprocess env	API keys from leaking	`local.py` (env blocklist)
Code execution	Agent from untrusted scripts	`code_execution_tool.py` (sandbox)
Subagent privilege	Parent from child escalation	`delegate_tool.py` (tool blocklist)
Remote credentials	Secrets in remote sandboxes	`credential_files.py` (mount isolation)
MCP servers	Supply chain attacks	`mcp_tool.py` (OSV check + env filtering)
SSRF	Internal network from external URLs	`url_safety.py` (IP range blocking)
Prompt injection	Memory from poisoning	`memory_tool.py` (pattern matching)
Skill content	System from malicious skills	`skills_guard.py` (security scan)

MCP Server Security (`tools/mcp_tool.py`)

Package Verification

# npx/uvx packages checked against OSV database before spawning
# Supply-chain audit: GitHub Actions pinned to commit SHAs

Environment Isolation

def _build_safe_env():
    """Only safe baseline variables + declared env vars.
    Credential stripping in error messages."""

Sampling Protection

# MCP servers can request LLM completions via sampling/createMessage
# Protected by: model allowlist, max_tokens_cap, max_rpm, max_tool_rounds