Hermes Agent - Skills, Memory & The Learning Loop

Overview

Hermes Agent's distinguishing feature is its closed learning loop: the agent creates reusable skills from experience, improves them during use, maintains persistent memory across sessions, and searches its own past conversations. This document traces how each component works and how they connect.

Skills System

Skill Structure

Skills live in ~/.hermes/skills/ (user-created) and skills/ (built-in, 28 categories):

skills/
├── category/
│   └── skill-name/
│       ├── SKILL.md           # Required: main instructions (YAML frontmatter + markdown)
│       ├── references/        # Supporting documentation
│       ├── templates/         # Output templates
│       ├── scripts/           # Executable scripts
│       └── assets/            # Supplementary files

SKILL.md Format

---
name: deploy-to-production
description: "Deploy the current branch to production via CI/CD pipeline"
version: "1.2.0"
license: MIT
platforms: [linux, macos]
prerequisites:
  env_vars: [DEPLOY_TOKEN, AWS_REGION]
  commands: [docker, aws]        # Advisory (not enforced)
required_environment_variables:
  - name: DEPLOY_TOKEN
    description: "CI/CD deployment token"
    prompt: "Enter your deployment token"
setup:
  collect_secrets: true
metadata:
  hermes:
    tags: [devops, deployment]
    related_skills: [docker-build, aws-ecs]
---
 
## Instructions
 
1. Check current branch is clean...
2. Run deployment pipeline...

Constraints (tools/skill_manager_tool.py:111-201):

Names: lowercase letters, numbers, hyphens, dots, underscores (max 64 chars)
Description: max 1,024 chars
SKILL.md content: max 100,000 chars
Supporting files: max 1 MiB each

Progressive Disclosure Architecture (`tools/skills_tool.py:10`)

Skills load in three tiers to minimize token consumption:

Tier 1: skills_list()          → name + description only (minimal tokens)
         ↓
Tier 2: skill_view(name)       → full SKILL.md content
         ↓  
Tier 3: skill_view(name, file) → linked reference/template/script files

System Prompt Integration (`agent/prompt_builder.py:583-808`)

build_skills_system_prompt() creates a compact skill index:

[AVAILABLE SKILLS]
-- devops --
  deploy-to-production: Deploy the current branch to production
  docker-build: Build and push Docker images
-- data-science --
  jupyter-analysis: Run Jupyter notebook analysis pipeline
  ...

Caching: Two-layer cache (in-process LRU + disk snapshot .skills_prompt_snapshot.json). Falls back to full filesystem scan on cache miss.

Conditional Activation (`agent/skill_utils.py:241-255`)

Skills can declare conditions for when they should be available:

# Activate only when browser tools are available
requires_toolsets: [browser]
 
# Activate as fallback when terminal is unavailable
fallback_for_toolsets: [terminal]
 
# Activate only when specific tools exist
requires_tools: [docker]
 
# Activate when specific tools are missing
fallback_for_tools: [kubectl]

Skill Creation & Self-Improvement

Creation trigger (from system prompt, prompt_builder.py:164-171):

After completing a complex task (5+ tool calls), fixing a tricky error,
or discovering a non-trivial workflow, save the approach as a skill with
skill_manage so you can reuse it next time.

Skill Manager actions (tools/skill_manager_tool.py):

Action	Purpose
`create`	Create new skill in `~/.hermes/skills/` with validation
`edit`	Replace entire SKILL.md content (full rewrite)
`patch`	Targeted find-and-replace within files (fuzzy matching)
`write_file`	Add/overwrite supporting files
`remove_file`	Delete supporting files
`delete`	Remove entire skill directory

Security on creation/edit:

Validate name, frontmatter, content size
Run security scan via skills_guard.py
Rollback if scan blocks the content

Skills Hub (`tools/skills_hub.py`)

Community skill marketplace:

# Default taps (GitHub repos)
DEFAULT_TAPS = [
    "openai/skills",
    "anthropics/skills",
    "VoltAgent/awesome-agent-skills",
]

Trust levels: builtin > trusted > community

Installation flow:

Search hub index (1-hour TTL cache)
Download skill from GitHub Contents API
Security scan via skills_guard.py
Write to ~/.hermes/skills/
Record provenance in lock file (~/.hermes/skills/.hub/lock.json)
Quarantine suspicious skills for manual review

Memory System

Architecture (`tools/memory_tool.py`)

Two persistent markdown files:

Store	Path	Purpose	Limit
`MEMORY.md`	`~/.hermes/memories/MEMORY.md`	Agent observations: environment facts, project conventions, tool quirks, discovered solutions	2,200 chars (~800 tokens)
`USER.md`	`~/.hermes/memories/USER.md`	User profile: preferences, communication style, workflow habits	1,375 chars (~500 tokens)

Entry Format

Entries are delimited by § (section sign):

§ User prefers concise responses without emoji
§ Project uses Poetry for dependency management, not pip
§ Terminal backend is Docker with custom image python:3.11-slim
§ PostgreSQL runs on port 5433 (non-standard) in dev environment

Actions

memory_tool(action="add", target="memory", content="...")    # Append entry
memory_tool(action="replace", target="user", old="...", new="...")  # Update entry
memory_tool(action="remove", target="memory", content="...") # Delete entry
memory_tool(action="read", target="memory")                   # Read all entries

The Frozen Snapshot Pattern

This is a critical design decision:

Session Start
    │
    ├── Load MEMORY.md and USER.md from disk
    ├── Create frozen snapshot (_system_prompt_snapshot)
    ├── Inject snapshot into system prompt
    │
    │   During Session:
    │   ├── memory_tool(action="add") → writes to DISK immediately
    │   ├── System prompt snapshot: UNCHANGED
    │   ├── Anthropic prefix cache: STABLE ✓
    │   └── All writes durable even if session crashes
    │
Session End / Next Session Start
    │
    ├── Load updated MEMORY.md (includes all mid-session writes)
    ├── Create NEW frozen snapshot
    └── Agent now sees updated memories

Why this matters: Anthropic's prompt caching gives a significant discount when the system prompt prefix is identical across API calls. If memory were injected live, every memory_tool(action="add") would invalidate the cache, increasing costs by 4-10x.

Memory Nudging

The system prompt includes periodic reminders to save durable facts:

# prompt_builder.py:144-156
"Save durable facts using the memory tool: user preferences,
 environment details, tool quirks, and stable conventions."
 
# Nudge frequency: configurable via config.yaml
nudge_interval: 10    # every 10 user turns
flush_min_turns: 6    # minimum turns before auto-persistence on exit

Injection Defense (`memory_tool.py:65-81`)

Memory writes are scanned for prompt injection and exfiltration patterns:

_MEMORY_THREAT_PATTERNS = [
    r"ignore previous instructions",
    r"you are now",
    r"curl.*secret",
    r"cat ~/.ssh",
    r"[\u200b\u200c\u200d]",  # zero-width characters
]

Blocked entries return an error, preventing persistence.

File-Level Locking

# Unix: fcntl
# Windows: msvcrt
# Atomic file replacement via os.replace()

Ensures safe concurrent access when multiple sessions share the same profile.

External Memory Providers (`plugins/memory/`)

Pluggable adapters for external memory services run alongside the built-in MemoryStore:

Provider	Path	Integration
Honcho	`plugins/memory/honcho/`	Dialectic user modeling
Mem0	`plugins/memory/mem0/`	Mem0 memory service
Supermemory	`plugins/memory/supermemory/`	Supermemory integration
OpenViking	`plugins/memory/openviking/`	Vector embeddings
Holographic	`plugins/memory/holographic/`	Vector store retrieval
RetainDB	`plugins/memory/retaindb/`	RetainDB integration
ByteRover	`plugins/memory/byterover/`	ByteRover service
Hindsight	`plugins/memory/hindsight/`	Hindsight memory

Initialization (run_agent.py:1266-1304):

Selected via memory.provider in config.yaml
Loaded dynamically at runtime
Scoped by session ID, user ID, gateway session key, profile name
Read-only adapters: provide context but don't replace built-in store

Session Search (Cross-Session Recall)

How It Works (`tools/session_search_tool.py`)

User asks about something from a past conversation
    │
    ▼
session_search(query="deployment script for staging")
    │
    ├── Step 1: FTS5 full-text search in SQLite
    │   └── Finds matching messages across all sessions
    │
    ├── Step 2: Group by session, take top N (default: 3)
    │
    ├── Step 3: Load each session's transcript
    │   └── Truncate to ~100K chars around match regions
    │
    ├── Step 4: Send to auxiliary model (Gemini Flash)
    │   └── With summarization prompt
    │
    └── Step 5: Return per-session summaries
        └── With metadata: date, source, message count, preview

Filtering

Source filtering: Excludes "tool" source sessions (third-party integrations)
Lineage filtering: Skips current session and parent/delegation sessions
Rich metadata: session_id, title, source, started_at, last_active, message_count

System Prompt Guidance (`prompt_builder.py:158-162`)

When the user references something from a past conversation or you suspect
relevant cross-session context exists, use session_search to recall it before
asking them to repeat themselves.

Todo Tool (In-Session Task Management)

Purpose (`tools/todo_tool.py`)

In-memory task list for complex multi-step work within a single session:

# Read current tasks
todo()
 
# Write/update tasks
todo(todos=[
    {"task": "Fix authentication bug", "status": "completed"},
    {"task": "Update tests", "status": "in_progress"},
    {"task": "Deploy to staging", "status": "pending"},
], merge=True)

Status Values

pending - Not yet started
in_progress - Currently working on
completed - Done
cancelled - Skipped

Context Compression Integration (`todo_tool.py:90-122`)

After context compression, active tasks are re-injected to prevent losing track:

## Active Tasks
- [x] Fix authentication bug
- [>] Update tests  
- [ ] Deploy to staging
- [~] Refactor logging (cancelled)

The Closed Learning Loop

Complete Workflow

┌─────────────────────────────────────────────────────────────────┐
│                     SESSION START                                │
│                                                                  │
│  1. Load MEMORY.md & USER.md (frozen snapshots)                 │
│  2. Build skills index from ~/.hermes/skills/ (cached)          │
│  3. Assemble system prompt (stable prefix for cache)            │
│  4. Initialize todo store (empty)                               │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  CONVERSATION LOOP                               │
│                                                                  │
│  ┌─ User sends message                                          │
│  │                                                               │
│  ├─ Agent scans skills index for relevant skills                │
│  │   └─ skill_view() loads full SKILL.md if match found         │
│  │                                                               │
│  ├─ Agent executes task with tools                              │
│  │   └─ Creates todo items for complex multi-step work          │
│  │                                                               │
│  ├─ Memory nudge triggers? (every ~10 turns)                    │
│  │   └─ memory_tool(action='add') → saves to disk immediately  │
│  │      (system prompt NOT updated — cache stays stable)        │
│  │                                                               │
│  ├─ Complex task completed? (5+ tool calls)                     │
│  │   └─ skill_manager_tool(action='create') → new skill         │
│  │      Security scan → write to ~/.hermes/skills/              │
│  │                                                               │
│  ├─ Existing skill outdated?                                    │
│  │   └─ skill_manager_tool(action='patch') → improve skill      │
│  │      Fuzzy matching → validate → security scan               │
│  │                                                               │
│  └─ Need past context?                                          │
│      └─ session_search() → FTS5 + LLM summarization            │
│                                                                  │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   SESSION PERSISTS                                │
│                                                                  │
│  - Updated MEMORY.md on disk (durable)                          │
│  - Updated USER.md on disk (durable)                            │
│  - New/updated skills in ~/.hermes/skills/ (durable)            │
│  - Session transcript in SQLite (FTS5-indexed)                  │
│  - Token counts and costs in SessionDB                          │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    NEXT SESSION                                   │
│                                                                  │
│  1. Reload MEMORY.md & USER.md (fresh snapshots)                │
│  2. Rebuild skills index (includes new/updated skills)          │
│  3. Agent benefits from ALL previous learnings                  │
│  4. session_search() available for cross-session recall         │
└─────────────────────────────────────────────────────────────────┘

Key Design Principles

Progressive disclosure: Skills load in tiers (metadata → content → linked files). This prevents a large skill library from consuming the entire context window.
Frozen snapshots: Memory snapshots at session start preserve prefix cache stability. Writes are durable but invisible to the current session's system prompt.
Immediate persistence: All writes to disk happen immediately, not just on session end. Crash-safe.
Security scanning: All user-created content (skills, memory) is scanned before acceptance. Skills Guard + memory threat patterns.
Auxiliary model offloading: Session search summarization uses a cheap auxiliary model (Gemini Flash), not the primary model, to avoid wasting expensive tokens.
Platform-aware skills: Skills can declare platform restrictions (linux, macos) and conditional activation based on available toolsets.
Community sharing: Skills can be published to and installed from the Skills Hub, with trust levels and quarantine for suspicious content.

What Makes This Different

Most AI agents are stateless: each conversation starts from zero. Hermes Agent maintains three types of persistence:

Type	Mechanism	Granularity	Example
Procedural memory	Skills (markdown files)	Task-level	"How to deploy this project"
Declarative memory	MEMORY.md, USER.md	Fact-level	"User prefers TypeScript"
Episodic memory	SessionDB + FTS5 search	Conversation-level	"We discussed the auth bug last Tuesday"

This mirrors human memory systems: skills (procedural), facts (declarative/semantic), and experiences (episodic).