Hermes Agent - Skills, Memory & The Learning Loop
Overview
Hermes Agent's distinguishing feature is its closed learning loop: the agent creates reusable skills from experience, improves them during use, maintains persistent memory across sessions, and searches its own past conversations. This document traces how each component works and how they connect.
Skills System
Skill Structure
Skills live in ~/.hermes/skills/ (user-created) and skills/ (built-in, 28 categories):
skills/
├── category/
│ └── skill-name/
│ ├── SKILL.md # Required: main instructions (YAML frontmatter + markdown)
│ ├── references/ # Supporting documentation
│ ├── templates/ # Output templates
│ ├── scripts/ # Executable scripts
│ └── assets/ # Supplementary files
SKILL.md Format
---
name: deploy-to-production
description: "Deploy the current branch to production via CI/CD pipeline"
version: "1.2.0"
license: MIT
platforms: [linux, macos]
prerequisites:
env_vars: [DEPLOY_TOKEN, AWS_REGION]
commands: [docker, aws] # Advisory (not enforced)
required_environment_variables:
- name: DEPLOY_TOKEN
description: "CI/CD deployment token"
prompt: "Enter your deployment token"
setup:
collect_secrets: true
metadata:
hermes:
tags: [devops, deployment]
related_skills: [docker-build, aws-ecs]
---
## Instructions
1. Check current branch is clean...
2. Run deployment pipeline...Constraints (tools/skill_manager_tool.py:111-201):
- Names: lowercase letters, numbers, hyphens, dots, underscores (max 64 chars)
- Description: max 1,024 chars
- SKILL.md content: max 100,000 chars
- Supporting files: max 1 MiB each
Progressive Disclosure Architecture (tools/skills_tool.py:10)
Skills load in three tiers to minimize token consumption:
Tier 1: skills_list() → name + description only (minimal tokens)
↓
Tier 2: skill_view(name) → full SKILL.md content
↓
Tier 3: skill_view(name, file) → linked reference/template/script files
System Prompt Integration (agent/prompt_builder.py:583-808)
build_skills_system_prompt() creates a compact skill index:
[AVAILABLE SKILLS]
-- devops --
deploy-to-production: Deploy the current branch to production
docker-build: Build and push Docker images
-- data-science --
jupyter-analysis: Run Jupyter notebook analysis pipeline
...
Caching: Two-layer cache (in-process LRU + disk snapshot .skills_prompt_snapshot.json). Falls back to full filesystem scan on cache miss.
Conditional Activation (agent/skill_utils.py:241-255)
Skills can declare conditions for when they should be available:
# Activate only when browser tools are available
requires_toolsets: [browser]
# Activate as fallback when terminal is unavailable
fallback_for_toolsets: [terminal]
# Activate only when specific tools exist
requires_tools: [docker]
# Activate when specific tools are missing
fallback_for_tools: [kubectl]Skill Creation & Self-Improvement
Creation trigger (from system prompt, prompt_builder.py:164-171):
After completing a complex task (5+ tool calls), fixing a tricky error,
or discovering a non-trivial workflow, save the approach as a skill with
skill_manage so you can reuse it next time.
Skill Manager actions (tools/skill_manager_tool.py):
| Action | Purpose |
|---|---|
create |
Create new skill in ~/.hermes/skills/ with validation |
edit |
Replace entire SKILL.md content (full rewrite) |
patch |
Targeted find-and-replace within files (fuzzy matching) |
write_file |
Add/overwrite supporting files |
remove_file |
Delete supporting files |
delete |
Remove entire skill directory |
Security on creation/edit:
- Validate name, frontmatter, content size
- Run security scan via
skills_guard.py - Rollback if scan blocks the content
Skills Hub (tools/skills_hub.py)
Community skill marketplace:
# Default taps (GitHub repos)
DEFAULT_TAPS = [
"openai/skills",
"anthropics/skills",
"VoltAgent/awesome-agent-skills",
]Trust levels: builtin > trusted > community
Installation flow:
- Search hub index (1-hour TTL cache)
- Download skill from GitHub Contents API
- Security scan via
skills_guard.py - Write to
~/.hermes/skills/ - Record provenance in lock file (
~/.hermes/skills/.hub/lock.json) - Quarantine suspicious skills for manual review
Memory System
Architecture (tools/memory_tool.py)
Two persistent markdown files:
| Store | Path | Purpose | Limit |
|---|---|---|---|
MEMORY.md |
~/.hermes/memories/MEMORY.md |
Agent observations: environment facts, project conventions, tool quirks, discovered solutions | 2,200 chars (~800 tokens) |
USER.md |
~/.hermes/memories/USER.md |
User profile: preferences, communication style, workflow habits | 1,375 chars (~500 tokens) |
Entry Format
Entries are delimited by § (section sign):
§ User prefers concise responses without emoji
§ Project uses Poetry for dependency management, not pip
§ Terminal backend is Docker with custom image python:3.11-slim
§ PostgreSQL runs on port 5433 (non-standard) in dev environmentActions
memory_tool(action="add", target="memory", content="...") # Append entry
memory_tool(action="replace", target="user", old="...", new="...") # Update entry
memory_tool(action="remove", target="memory", content="...") # Delete entry
memory_tool(action="read", target="memory") # Read all entriesThe Frozen Snapshot Pattern
This is a critical design decision:
Session Start
│
├── Load MEMORY.md and USER.md from disk
├── Create frozen snapshot (_system_prompt_snapshot)
├── Inject snapshot into system prompt
│
│ During Session:
│ ├── memory_tool(action="add") → writes to DISK immediately
│ ├── System prompt snapshot: UNCHANGED
│ ├── Anthropic prefix cache: STABLE ✓
│ └── All writes durable even if session crashes
│
Session End / Next Session Start
│
├── Load updated MEMORY.md (includes all mid-session writes)
├── Create NEW frozen snapshot
└── Agent now sees updated memories
Why this matters: Anthropic's prompt caching gives a significant discount when the system prompt prefix is identical across API calls. If memory were injected live, every memory_tool(action="add") would invalidate the cache, increasing costs by 4-10x.
Memory Nudging
The system prompt includes periodic reminders to save durable facts:
# prompt_builder.py:144-156
"Save durable facts using the memory tool: user preferences,
environment details, tool quirks, and stable conventions."
# Nudge frequency: configurable via config.yaml
nudge_interval: 10 # every 10 user turns
flush_min_turns: 6 # minimum turns before auto-persistence on exitInjection Defense (memory_tool.py:65-81)
Memory writes are scanned for prompt injection and exfiltration patterns:
_MEMORY_THREAT_PATTERNS = [
r"ignore previous instructions",
r"you are now",
r"curl.*secret",
r"cat ~/.ssh",
r"[\u200b\u200c\u200d]", # zero-width characters
]Blocked entries return an error, preventing persistence.
File-Level Locking
# Unix: fcntl
# Windows: msvcrt
# Atomic file replacement via os.replace()Ensures safe concurrent access when multiple sessions share the same profile.
External Memory Providers (plugins/memory/)
Pluggable adapters for external memory services run alongside the built-in MemoryStore:
| Provider | Path | Integration |
|---|---|---|
| Honcho | plugins/memory/honcho/ |
Dialectic user modeling |
| Mem0 | plugins/memory/mem0/ |
Mem0 memory service |
| Supermemory | plugins/memory/supermemory/ |
Supermemory integration |
| OpenViking | plugins/memory/openviking/ |
Vector embeddings |
| Holographic | plugins/memory/holographic/ |
Vector store retrieval |
| RetainDB | plugins/memory/retaindb/ |
RetainDB integration |
| ByteRover | plugins/memory/byterover/ |
ByteRover service |
| Hindsight | plugins/memory/hindsight/ |
Hindsight memory |
Initialization (run_agent.py:1266-1304):
- Selected via
memory.providerin config.yaml - Loaded dynamically at runtime
- Scoped by session ID, user ID, gateway session key, profile name
- Read-only adapters: provide context but don't replace built-in store
Session Search (Cross-Session Recall)
How It Works (tools/session_search_tool.py)
User asks about something from a past conversation
│
▼
session_search(query="deployment script for staging")
│
├── Step 1: FTS5 full-text search in SQLite
│ └── Finds matching messages across all sessions
│
├── Step 2: Group by session, take top N (default: 3)
│
├── Step 3: Load each session's transcript
│ └── Truncate to ~100K chars around match regions
│
├── Step 4: Send to auxiliary model (Gemini Flash)
│ └── With summarization prompt
│
└── Step 5: Return per-session summaries
└── With metadata: date, source, message count, preview
Filtering
- Source filtering: Excludes "tool" source sessions (third-party integrations)
- Lineage filtering: Skips current session and parent/delegation sessions
- Rich metadata: session_id, title, source, started_at, last_active, message_count
System Prompt Guidance (prompt_builder.py:158-162)
When the user references something from a past conversation or you suspect
relevant cross-session context exists, use session_search to recall it before
asking them to repeat themselves.
Todo Tool (In-Session Task Management)
Purpose (tools/todo_tool.py)
In-memory task list for complex multi-step work within a single session:
# Read current tasks
todo()
# Write/update tasks
todo(todos=[
{"task": "Fix authentication bug", "status": "completed"},
{"task": "Update tests", "status": "in_progress"},
{"task": "Deploy to staging", "status": "pending"},
], merge=True)Status Values
pending- Not yet startedin_progress- Currently working oncompleted- Donecancelled- Skipped
Context Compression Integration (todo_tool.py:90-122)
After context compression, active tasks are re-injected to prevent losing track:
## Active Tasks
- [x] Fix authentication bug
- [>] Update tests
- [ ] Deploy to staging
- [~] Refactor logging (cancelled)The Closed Learning Loop
Complete Workflow
┌─────────────────────────────────────────────────────────────────┐
│ SESSION START │
│ │
│ 1. Load MEMORY.md & USER.md (frozen snapshots) │
│ 2. Build skills index from ~/.hermes/skills/ (cached) │
│ 3. Assemble system prompt (stable prefix for cache) │
│ 4. Initialize todo store (empty) │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CONVERSATION LOOP │
│ │
│ ┌─ User sends message │
│ │ │
│ ├─ Agent scans skills index for relevant skills │
│ │ └─ skill_view() loads full SKILL.md if match found │
│ │ │
│ ├─ Agent executes task with tools │
│ │ └─ Creates todo items for complex multi-step work │
│ │ │
│ ├─ Memory nudge triggers? (every ~10 turns) │
│ │ └─ memory_tool(action='add') → saves to disk immediately │
│ │ (system prompt NOT updated — cache stays stable) │
│ │ │
│ ├─ Complex task completed? (5+ tool calls) │
│ │ └─ skill_manager_tool(action='create') → new skill │
│ │ Security scan → write to ~/.hermes/skills/ │
│ │ │
│ ├─ Existing skill outdated? │
│ │ └─ skill_manager_tool(action='patch') → improve skill │
│ │ Fuzzy matching → validate → security scan │
│ │ │
│ └─ Need past context? │
│ └─ session_search() → FTS5 + LLM summarization │
│ │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ SESSION PERSISTS │
│ │
│ - Updated MEMORY.md on disk (durable) │
│ - Updated USER.md on disk (durable) │
│ - New/updated skills in ~/.hermes/skills/ (durable) │
│ - Session transcript in SQLite (FTS5-indexed) │
│ - Token counts and costs in SessionDB │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ NEXT SESSION │
│ │
│ 1. Reload MEMORY.md & USER.md (fresh snapshots) │
│ 2. Rebuild skills index (includes new/updated skills) │
│ 3. Agent benefits from ALL previous learnings │
│ 4. session_search() available for cross-session recall │
└─────────────────────────────────────────────────────────────────┘
Key Design Principles
-
Progressive disclosure: Skills load in tiers (metadata → content → linked files). This prevents a large skill library from consuming the entire context window.
-
Frozen snapshots: Memory snapshots at session start preserve prefix cache stability. Writes are durable but invisible to the current session's system prompt.
-
Immediate persistence: All writes to disk happen immediately, not just on session end. Crash-safe.
-
Security scanning: All user-created content (skills, memory) is scanned before acceptance. Skills Guard + memory threat patterns.
-
Auxiliary model offloading: Session search summarization uses a cheap auxiliary model (Gemini Flash), not the primary model, to avoid wasting expensive tokens.
-
Platform-aware skills: Skills can declare platform restrictions (
linux,macos) and conditional activation based on available toolsets. -
Community sharing: Skills can be published to and installed from the Skills Hub, with trust levels and quarantine for suspicious content.
What Makes This Different
Most AI agents are stateless: each conversation starts from zero. Hermes Agent maintains three types of persistence:
| Type | Mechanism | Granularity | Example |
|---|---|---|---|
| Procedural memory | Skills (markdown files) | Task-level | "How to deploy this project" |
| Declarative memory | MEMORY.md, USER.md | Fact-level | "User prefers TypeScript" |
| Episodic memory | SessionDB + FTS5 search | Conversation-level | "We discussed the auth bug last Tuesday" |
This mirrors human memory systems: skills (procedural), facts (declarative/semantic), and experiences (episodic).