CodeDocs Vault

Hermes Agent - Skills, Memory & The Learning Loop

Overview

Hermes Agent's distinguishing feature is its closed learning loop: the agent creates reusable skills from experience, improves them during use, maintains persistent memory across sessions, and searches its own past conversations. This document traces how each component works and how they connect.

Skills System

Skill Structure

Skills live in ~/.hermes/skills/ (user-created) and skills/ (built-in, 28 categories):

skills/
├── category/
│   └── skill-name/
│       ├── SKILL.md           # Required: main instructions (YAML frontmatter + markdown)
│       ├── references/        # Supporting documentation
│       ├── templates/         # Output templates
│       ├── scripts/           # Executable scripts
│       └── assets/            # Supplementary files

SKILL.md Format

---
name: deploy-to-production
description: "Deploy the current branch to production via CI/CD pipeline"
version: "1.2.0"
license: MIT
platforms: [linux, macos]
prerequisites:
  env_vars: [DEPLOY_TOKEN, AWS_REGION]
  commands: [docker, aws]        # Advisory (not enforced)
required_environment_variables:
  - name: DEPLOY_TOKEN
    description: "CI/CD deployment token"
    prompt: "Enter your deployment token"
setup:
  collect_secrets: true
metadata:
  hermes:
    tags: [devops, deployment]
    related_skills: [docker-build, aws-ecs]
---
 
## Instructions
 
1. Check current branch is clean...
2. Run deployment pipeline...

Constraints (tools/skill_manager_tool.py:111-201):

Progressive Disclosure Architecture (tools/skills_tool.py:10)

Skills load in three tiers to minimize token consumption:

Tier 1: skills_list()          → name + description only (minimal tokens)
         ↓
Tier 2: skill_view(name)       → full SKILL.md content
         ↓  
Tier 3: skill_view(name, file) → linked reference/template/script files

System Prompt Integration (agent/prompt_builder.py:583-808)

build_skills_system_prompt() creates a compact skill index:

[AVAILABLE SKILLS]
-- devops --
  deploy-to-production: Deploy the current branch to production
  docker-build: Build and push Docker images
-- data-science --
  jupyter-analysis: Run Jupyter notebook analysis pipeline
  ...

Caching: Two-layer cache (in-process LRU + disk snapshot .skills_prompt_snapshot.json). Falls back to full filesystem scan on cache miss.

Conditional Activation (agent/skill_utils.py:241-255)

Skills can declare conditions for when they should be available:

# Activate only when browser tools are available
requires_toolsets: [browser]
 
# Activate as fallback when terminal is unavailable
fallback_for_toolsets: [terminal]
 
# Activate only when specific tools exist
requires_tools: [docker]
 
# Activate when specific tools are missing
fallback_for_tools: [kubectl]

Skill Creation & Self-Improvement

Creation trigger (from system prompt, prompt_builder.py:164-171):

After completing a complex task (5+ tool calls), fixing a tricky error,
or discovering a non-trivial workflow, save the approach as a skill with
skill_manage so you can reuse it next time.

Skill Manager actions (tools/skill_manager_tool.py):

Action Purpose
create Create new skill in ~/.hermes/skills/ with validation
edit Replace entire SKILL.md content (full rewrite)
patch Targeted find-and-replace within files (fuzzy matching)
write_file Add/overwrite supporting files
remove_file Delete supporting files
delete Remove entire skill directory

Security on creation/edit:

  1. Validate name, frontmatter, content size
  2. Run security scan via skills_guard.py
  3. Rollback if scan blocks the content

Skills Hub (tools/skills_hub.py)

Community skill marketplace:

# Default taps (GitHub repos)
DEFAULT_TAPS = [
    "openai/skills",
    "anthropics/skills",
    "VoltAgent/awesome-agent-skills",
]

Trust levels: builtin > trusted > community

Installation flow:

  1. Search hub index (1-hour TTL cache)
  2. Download skill from GitHub Contents API
  3. Security scan via skills_guard.py
  4. Write to ~/.hermes/skills/
  5. Record provenance in lock file (~/.hermes/skills/.hub/lock.json)
  6. Quarantine suspicious skills for manual review

Memory System

Architecture (tools/memory_tool.py)

Two persistent markdown files:

Store Path Purpose Limit
MEMORY.md ~/.hermes/memories/MEMORY.md Agent observations: environment facts, project conventions, tool quirks, discovered solutions 2,200 chars (~800 tokens)
USER.md ~/.hermes/memories/USER.md User profile: preferences, communication style, workflow habits 1,375 chars (~500 tokens)

Entry Format

Entries are delimited by § (section sign):

§ User prefers concise responses without emoji
§ Project uses Poetry for dependency management, not pip
§ Terminal backend is Docker with custom image python:3.11-slim
§ PostgreSQL runs on port 5433 (non-standard) in dev environment

Actions

memory_tool(action="add", target="memory", content="...")    # Append entry
memory_tool(action="replace", target="user", old="...", new="...")  # Update entry
memory_tool(action="remove", target="memory", content="...") # Delete entry
memory_tool(action="read", target="memory")                   # Read all entries

The Frozen Snapshot Pattern

This is a critical design decision:

Session Start
    │
    ├── Load MEMORY.md and USER.md from disk
    ├── Create frozen snapshot (_system_prompt_snapshot)
    ├── Inject snapshot into system prompt
    │
    │   During Session:
    │   ├── memory_tool(action="add") → writes to DISK immediately
    │   ├── System prompt snapshot: UNCHANGED
    │   ├── Anthropic prefix cache: STABLE ✓
    │   └── All writes durable even if session crashes
    │
Session End / Next Session Start
    │
    ├── Load updated MEMORY.md (includes all mid-session writes)
    ├── Create NEW frozen snapshot
    └── Agent now sees updated memories

Why this matters: Anthropic's prompt caching gives a significant discount when the system prompt prefix is identical across API calls. If memory were injected live, every memory_tool(action="add") would invalidate the cache, increasing costs by 4-10x.

Memory Nudging

The system prompt includes periodic reminders to save durable facts:

# prompt_builder.py:144-156
"Save durable facts using the memory tool: user preferences,
 environment details, tool quirks, and stable conventions."
 
# Nudge frequency: configurable via config.yaml
nudge_interval: 10    # every 10 user turns
flush_min_turns: 6    # minimum turns before auto-persistence on exit

Injection Defense (memory_tool.py:65-81)

Memory writes are scanned for prompt injection and exfiltration patterns:

_MEMORY_THREAT_PATTERNS = [
    r"ignore previous instructions",
    r"you are now",
    r"curl.*secret",
    r"cat ~/.ssh",
    r"[\u200b\u200c\u200d]",  # zero-width characters
]

Blocked entries return an error, preventing persistence.

File-Level Locking

# Unix: fcntl
# Windows: msvcrt
# Atomic file replacement via os.replace()

Ensures safe concurrent access when multiple sessions share the same profile.

External Memory Providers (plugins/memory/)

Pluggable adapters for external memory services run alongside the built-in MemoryStore:

Provider Path Integration
Honcho plugins/memory/honcho/ Dialectic user modeling
Mem0 plugins/memory/mem0/ Mem0 memory service
Supermemory plugins/memory/supermemory/ Supermemory integration
OpenViking plugins/memory/openviking/ Vector embeddings
Holographic plugins/memory/holographic/ Vector store retrieval
RetainDB plugins/memory/retaindb/ RetainDB integration
ByteRover plugins/memory/byterover/ ByteRover service
Hindsight plugins/memory/hindsight/ Hindsight memory

Initialization (run_agent.py:1266-1304):

Session Search (Cross-Session Recall)

How It Works (tools/session_search_tool.py)

User asks about something from a past conversation
    │
    ▼
session_search(query="deployment script for staging")
    │
    ├── Step 1: FTS5 full-text search in SQLite
    │   └── Finds matching messages across all sessions
    │
    ├── Step 2: Group by session, take top N (default: 3)
    │
    ├── Step 3: Load each session's transcript
    │   └── Truncate to ~100K chars around match regions
    │
    ├── Step 4: Send to auxiliary model (Gemini Flash)
    │   └── With summarization prompt
    │
    └── Step 5: Return per-session summaries
        └── With metadata: date, source, message count, preview

Filtering

System Prompt Guidance (prompt_builder.py:158-162)

When the user references something from a past conversation or you suspect
relevant cross-session context exists, use session_search to recall it before
asking them to repeat themselves.

Todo Tool (In-Session Task Management)

Purpose (tools/todo_tool.py)

In-memory task list for complex multi-step work within a single session:

# Read current tasks
todo()
 
# Write/update tasks
todo(todos=[
    {"task": "Fix authentication bug", "status": "completed"},
    {"task": "Update tests", "status": "in_progress"},
    {"task": "Deploy to staging", "status": "pending"},
], merge=True)

Status Values

Context Compression Integration (todo_tool.py:90-122)

After context compression, active tasks are re-injected to prevent losing track:

## Active Tasks
- [x] Fix authentication bug
- [>] Update tests  
- [ ] Deploy to staging
- [~] Refactor logging (cancelled)

The Closed Learning Loop

Complete Workflow

┌─────────────────────────────────────────────────────────────────┐
│                     SESSION START                                │
│                                                                  │
│  1. Load MEMORY.md & USER.md (frozen snapshots)                 │
│  2. Build skills index from ~/.hermes/skills/ (cached)          │
│  3. Assemble system prompt (stable prefix for cache)            │
│  4. Initialize todo store (empty)                               │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  CONVERSATION LOOP                               │
│                                                                  │
│  ┌─ User sends message                                          │
│  │                                                               │
│  ├─ Agent scans skills index for relevant skills                │
│  │   └─ skill_view() loads full SKILL.md if match found         │
│  │                                                               │
│  ├─ Agent executes task with tools                              │
│  │   └─ Creates todo items for complex multi-step work          │
│  │                                                               │
│  ├─ Memory nudge triggers? (every ~10 turns)                    │
│  │   └─ memory_tool(action='add') → saves to disk immediately  │
│  │      (system prompt NOT updated — cache stays stable)        │
│  │                                                               │
│  ├─ Complex task completed? (5+ tool calls)                     │
│  │   └─ skill_manager_tool(action='create') → new skill         │
│  │      Security scan → write to ~/.hermes/skills/              │
│  │                                                               │
│  ├─ Existing skill outdated?                                    │
│  │   └─ skill_manager_tool(action='patch') → improve skill      │
│  │      Fuzzy matching → validate → security scan               │
│  │                                                               │
│  └─ Need past context?                                          │
│      └─ session_search() → FTS5 + LLM summarization            │
│                                                                  │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   SESSION PERSISTS                                │
│                                                                  │
│  - Updated MEMORY.md on disk (durable)                          │
│  - Updated USER.md on disk (durable)                            │
│  - New/updated skills in ~/.hermes/skills/ (durable)            │
│  - Session transcript in SQLite (FTS5-indexed)                  │
│  - Token counts and costs in SessionDB                          │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    NEXT SESSION                                   │
│                                                                  │
│  1. Reload MEMORY.md & USER.md (fresh snapshots)                │
│  2. Rebuild skills index (includes new/updated skills)          │
│  3. Agent benefits from ALL previous learnings                  │
│  4. session_search() available for cross-session recall         │
└─────────────────────────────────────────────────────────────────┘

Key Design Principles

  1. Progressive disclosure: Skills load in tiers (metadata → content → linked files). This prevents a large skill library from consuming the entire context window.

  2. Frozen snapshots: Memory snapshots at session start preserve prefix cache stability. Writes are durable but invisible to the current session's system prompt.

  3. Immediate persistence: All writes to disk happen immediately, not just on session end. Crash-safe.

  4. Security scanning: All user-created content (skills, memory) is scanned before acceptance. Skills Guard + memory threat patterns.

  5. Auxiliary model offloading: Session search summarization uses a cheap auxiliary model (Gemini Flash), not the primary model, to avoid wasting expensive tokens.

  6. Platform-aware skills: Skills can declare platform restrictions (linux, macos) and conditional activation based on available toolsets.

  7. Community sharing: Skills can be published to and installed from the Skills Hub, with trust levels and quarantine for suspicious content.

What Makes This Different

Most AI agents are stateless: each conversation starts from zero. Hermes Agent maintains three types of persistence:

Type Mechanism Granularity Example
Procedural memory Skills (markdown files) Task-level "How to deploy this project"
Declarative memory MEMORY.md, USER.md Fact-level "User prefers TypeScript"
Episodic memory SessionDB + FTS5 search Conversation-level "We discussed the auth bug last Tuesday"

This mirrors human memory systems: skills (procedural), facts (declarative/semantic), and experiences (episodic).