Hermes Agent - Security Architecture
Threat Model
Hermes is a single-tenant personal agent with one trusted operator. The security model protects the operator from unintended LLM-driven actions, not from malicious co-tenants. Multi-user isolation relies on OS/host-level separation.
Reference: SECURITY.md (85 lines)
Security Layers
SECURITY BOUNDARY STACK
┌─────────────────────────────────────────────────┐
│ Layer 1: COMMAND APPROVAL (tools/approval.py) │
│ 37 dangerous patterns, user confirmation │
├─────────────────────────────────────────────────┤
│ Layer 2: PATH SECURITY (tools/path_security.py) │
│ Symlink-aware traversal detection │
├─────────────────────────────────────────────────┤
│ Layer 3: FILE PROTECTION (tools/file_tools.py) │
│ Device blocking, sensitive paths, read limits │
├─────────────────────────────────────────────────┤
│ Layer 4: CREDENTIAL PROTECTION │
│ (tools/credential_files.py) │
│ Env filtering, credential mounting │
├─────────────────────────────────────────────────┤
│ Layer 5: URL SAFETY (tools/url_safety.py) │
│ SSRF prevention, private IP blocking │
├─────────────────────────────────────────────────┤
│ Layer 6: CONTENT SCANNING (tirith_security.py) │
│ Homograph URLs, injection, obfuscation │
├─────────────────────────────────────────────────┤
│ Layer 7: MEMORY INJECTION DEFENSE │
│ (tools/memory_tool.py) │
│ Pattern matching for prompt injection │
├─────────────────────────────────────────────────┤
│ Layer 8: SKILL GUARD (tools/skills_guard.py) │
│ Security scan on skill install/create/edit │
├─────────────────────────────────────────────────┤
│ Layer 9: DELEGATION ISOLATION │
│ (tools/delegate_tool.py) │
│ Tool blocklists, depth limits, memory isolation │
├─────────────────────────────────────────────────┤
│ Layer 10: EXECUTION SANDBOXING │
│ (tools/environments/docker.py, modal.py, etc.) │
│ Container isolation, resource limits │
└─────────────────────────────────────────────────┘
Layer 1: Dangerous Command Approval (tools/approval.py)
Detection Patterns (lines 75-138)
37 dangerous patterns organized by category:
| Category | Examples |
|---|---|
| Recursive delete | rm -r, rm --recursive, rm -rf |
| Filesystem ops | mkfs, dd, chmod 777, chown |
| SQL destructive | DROP TABLE, DELETE (without WHERE), TRUNCATE |
| Shell injection | Pipe to sh|bash, -c flag execution |
| Git destructive | reset --hard, push --force, push -f |
| System service | systemctl stop, systemctl restart |
| Package management | apt remove, pip uninstall |
| Network | iptables, ufw |
| Self-modification | hermes update, gateway run (outside systemd) |
Approval Modes
| Mode | Behavior | Use Case |
|---|---|---|
"on" |
Prompt user for confirmation | Default, recommended |
"auto" |
Auto-approve after configurable delay | Unattended automation |
"off" |
Disable approval entirely | Break-glass only |
State Management (lines 204-256)
# Per-session approval state (thread-safe, keyed by session_key)
class ApprovalState:
permanent_allowlist: Set[str] # Persisted in config.yaml
session_approvals: Set[str] # This session only
pending_approval: Optional[Event] # Blocking queue (gateway async)Approval Keys
# Canonical key (human-readable):
"recursive delete"
# Legacy key (regex-derived, backward compat):
"rm_recursive"
# Multiple aliases can match the same pattern for config migrationLayer 2: Path Security (tools/path_security.py)
def validate_within_dir(path: str, root: str) -> bool:
"""Ensure resolved path doesn't escape root directory. Symlink-aware."""
def has_traversal_component(path_str: str) -> bool:
"""Detect '..' path components."""Used by: Skills Guard, credential file registration, cronjob tools.
Layer 3: File Protection (tools/file_tools.py)
Device Blocking (lines 62-90)
BLOCKED_DEVICE_PATHS = [
"/dev/zero", "/dev/random", "/dev/urandom",
"/dev/stdin", "/dev/stdout", "/dev/stderr",
"/dev/tty", "/dev/null" # prevent hangs
]
# Checks literal paths (no symlink following to defeat checks)Sensitive Path Protection (lines 94-118)
SENSITIVE_WRITE_PATHS = [
"/etc/", "/boot/", "/usr/lib/systemd/",
"/var/run/docker.sock" # Docker socket
]
# Blocks writes; reads allowed. Requires terminal approval to bypass.Read Size Guards (lines 18-28)
- Default: 100,000 chars max per read (~25-35K tokens)
- Configurable via
config.yaml: file_read_max_chars - Encourages
offset+limitfor large files
External Modification Detection
- Thread-safe tracking of files read/written per
task_id - Detects re-read loops
- Warns when file changed between agent's read and write
Layer 4: Credential Protection (tools/credential_files.py)
Environment Variable Filtering
# tools/environments/local.py
_HERMES_PROVIDER_ENV_BLOCKLIST = [
"OPENROUTER_API_KEY", "ANTHROPIC_API_KEY", "OPENAI_API_KEY",
"GOOGLE_API_KEY", "GITHUB_TOKEN", "SLACK_BOT_TOKEN",
# ... all provider credentials
]API keys/tokens are stripped from subprocess environments. Only explicitly declared env vars are passed through (via skills or config).
Credential File Registry
# Session-scoped (ContextVar-backed) for cross-session isolation
def register_credential_file(relative_path: str):
"""
1. Validates path (no absolute, no .., no traversal)
2. Resolves to HERMES_HOME/relative_path
3. Stores for remote sandbox mounting
"""Flow:
- Skills declare
required_credential_files(relative toHERMES_HOME) - Remote backends query registry at sandbox creation + pre-command
- Files mounted read-only where possible
Layer 5: URL Safety (tools/url_safety.py)
SSRF Prevention
BLOCKED_IP_RANGES = [
"10.0.0.0/8", # Private
"172.16.0.0/12", # Private
"192.168.0.0/16", # Private
"127.0.0.0/8", # Loopback
"169.254.0.0/16", # Link-local
"100.64.0.0/10", # CGNAT (RFC 6598)
"224.0.0.0/4", # Multicast
"0.0.0.0/8", # Unspecified
]
BLOCKED_HOSTNAMES = [
"metadata.google.internal",
"metadata.goog",
]Fail-closed: DNS resolution errors block the request (prevent DNS rebinding TOCTOU).
Documented Limitations
- DNS rebinding: attacker-controlled DNS with TTL=0 can bypass TOCTOU check
- Redirect bypass: mitigated by redirect validation in vision_tools and gateway adapters
- Third-party web tools (Firecrawl/Tavily): redirect handling on their servers
Layer 6: Content Scanning (tools/tirith_security.py)
Tirith Binary
External security scanner with automatic installation:
# Auto-install from GitHub releases
# SHA-256 checksum verification
# Optional cosign provenance verification
# Disk-persistent failure markers (24-hour TTL)Detection Categories
- Homograph URLs (unicode lookalike characters)
- Pipe-to-interpreter patterns (
curl | bash) - Terminal injection attempts (ANSI escape sequences)
- Command obfuscation (Unicode normalization attacks)
Configuration
security:
tirith_enabled: true # Default: on
tirith_path: "tirith" # Binary location
tirith_timeout: 5 # Seconds
tirith_fail_open: true # Allow if scanner unavailableLayer 7: Memory Injection Defense (tools/memory_tool.py:65-81)
Pattern Matching
_MEMORY_THREAT_PATTERNS = [
# Prompt injection
r"ignore previous instructions",
r"you are now",
r"disregard all",
r"forget everything",
# Exfiltration
r"curl.*secret", r"wget.*password",
r"cat ~/.ssh", r"cat ~/.env",
# Invisible unicode
r"[\u200b\u200c\u200d\u2060\ufeff]", # zero-width chars
r"[\u202a-\u202e]", # bidi override
]Memory writes are scanned before acceptance. Blocked entries return an error, preventing persistence.
Layer 8: Skill Guard (tools/skills_guard.py)
Skills from the Skills Hub and agent-created skills are security-scanned:
- Content scanning for injection/exfiltration patterns
- Trusted repos list (built-in + configured)
- Quarantine directory for suspicious skills pending review
- Rollback on security scan failure during create/edit
Layer 9: Delegation Isolation (tools/delegate_tool.py)
DELEGATE_BLOCKED_TOOLS = [
"delegate_task", # No recursive delegation
"clarify", # No user interaction
"memory", # No memory access
"send_message", # No messaging
"execute_code", # No code execution sandbox
]
MAX_DEPTH = 2 # parent → child → rejectedEach subagent gets:
- Fresh conversation (no parent history)
- Own
task_id(isolated terminal session, file ops cache) skip_memory=True(no memory reads or writes)- Shared iteration budget (prevents runaway delegation)
Layer 10: Execution Sandboxing
Docker Backend (tools/environments/docker.py)
docker_run_args = [
"--cap-drop=ALL", # Drop all capabilities
"--security-opt=no-new-privileges",
"--pids-limit=256", # Fork bomb protection
"--memory=512m", # Memory limit
"--cpu-shares=512", # CPU throttling
"--network=bridge", # Network isolation
"--read-only", # Read-only root filesystem (optional)
]Code Execution Sandbox (tools/code_execution_tool.py)
# API keys stripped from child environment
# Only explicitly declared env vars passed through
# Timeout: 300s (5 min)
# Max tool calls: 50
# Max stdout: 50 KB, stderr: 10 KBSecurity Boundary Summary
| Boundary | What It Protects | Implementation |
|---|---|---|
| Dangerous commands | Host system from destructive ops | approval.py (37 patterns) |
| Sensitive paths | System files from writes | file_tools.py (path blocklist) |
| Subprocess env | API keys from leaking | local.py (env blocklist) |
| Code execution | Agent from untrusted scripts | code_execution_tool.py (sandbox) |
| Subagent privilege | Parent from child escalation | delegate_tool.py (tool blocklist) |
| Remote credentials | Secrets in remote sandboxes | credential_files.py (mount isolation) |
| MCP servers | Supply chain attacks | mcp_tool.py (OSV check + env filtering) |
| SSRF | Internal network from external URLs | url_safety.py (IP range blocking) |
| Prompt injection | Memory from poisoning | memory_tool.py (pattern matching) |
| Skill content | System from malicious skills | skills_guard.py (security scan) |
MCP Server Security (tools/mcp_tool.py)
Package Verification
# npx/uvx packages checked against OSV database before spawning
# Supply-chain audit: GitHub Actions pinned to commit SHAsEnvironment Isolation
def _build_safe_env():
"""Only safe baseline variables + declared env vars.
Credential stripping in error messages."""Sampling Protection
# MCP servers can request LLM completions via sampling/createMessage
# Protected by: model allowlist, max_tokens_cap, max_rpm, max_tool_rounds