15. Memory System — Deep Dive
This doc goes beyond the taxonomy and prompt shape of the memory system (covered in docs 6 and 8) to the exact mechanics: extraction prompts, trigger thresholds, fork agent shape, recall scoring, team scope, cache integration.
15.1 The two extraction prompts — /workspaces/src/services/extractMemories/prompts.ts
buildExtractAutoOnlyPrompt() — prompts.ts:50-94
Used when only private memory is active. Verbatim:
You are now acting as the memory extraction subagent. Analyze the most recent ~${newMessageCount}
messages above and use them to update your persistent memory systems.
Available tools: File Read, Grep, Glob, read-only Bash (ls/find/cat/stat/wc/head/tail and
similar), and File Edit/File Write for paths inside the memory directory only. Bash rm is not
permitted. All other tools — MCP, Agent, write-capable Bash, etc — will be denied.
You have a limited turn budget. File Edit requires a prior File Read of the same file, so the
efficient strategy is: turn 1 — issue all File Read calls in parallel for every file you might
update; turn 2 — issue all File Write/File Edit calls in parallel. Do not interleave reads and
writes across multiple turns.
You MUST only use content from the last ~${newMessageCount} messages to update your persistent
memories. Do not waste any turns attempting to investigate or verify that content further —
no grepping source files, no reading code to confirm a pattern exists, no git commands.
Followed by:
- The four-type taxonomy (
TYPES_SECTION_INDIVIDUALfrommemoryTypes.ts:113). - "What NOT to save" rules.
- "How to save memories" — a two-step instruction: first write the topic file with frontmatter, then add a one-line entry to
MEMORY.md. Index lines must be under ~150 chars and have no frontmatter.
buildExtractCombinedPrompt() — prompts.ts:101-154
Used when team memory is enabled. Similar skeleton but with:
- The
TYPES_SECTION_COMBINEDvariant — each type has a<scope>tag declaring private/team preference. - A note that both directories have their own separate
MEMORY.mdindexes. - A warning against saving sensitive data (API keys, credentials) in team memories.
15.2 When extraction fires — /workspaces/src/services/extractMemories/extractMemories.ts:296-616
Extraction is a stop hook — it runs after the model produces a final response with no tool calls.
The gates
- Feature flag:
tengu_passport_quailmust be true (extractMemories.ts:536-542). If false, extraction never runs. - Auto-memory enabled:
isAutoMemoryEnabled()checksCLAUDE_CODE_DISABLE_AUTO_MEMORYenv var + user settings (line 545). - Local mode only: not run in remote sessions (line 550).
- Throttle: feature flag
tengu_bramble_lintel(default 1) determines every-N-turns cadence (lines 377-385).
The mutual-exclusion check
hasMemoryWritesSince(messages, cursor) (lines 121-148) detects whether the main agent already wrote to memory files in the current turn. If it did, extraction skips — no point running a forked writer over the same content. The cursor still advances so the next extraction sees fresh delta.
Cursor-based dedup
lastMemoryMessageUuid // tracked in AppStatecountModelVisibleMessagesSince(cursor) counts only user/assistant messages after the cursor. After a successful extraction, the cursor advances to the last processed message (lines 354, 434). Tool-call counts aren't a direct trigger — token delta and turn cadence are the primary levers.
Trailing runs
If a new extraction call arrives while one is running, the new context is stashed in pendingContext (lines 557-563). After the current run finishes, a trailing extraction fires for the delta — its newMessageCount is computed relative to the already-advanced cursor, not the original window.
15.3 The extraction fork — /workspaces/src/utils/forkedAgent.ts
Extraction runs via runForkedAgent(), the same primitive used by /btw and fork subagents. The fork inherits:
- System prompt: identical to the parent's. No extraction-specific system prompt — the extraction prompt is delivered as a user message.
- Message history: the parent's full conversation (via
cacheSafeParams). - Prompt cache: byte-identical prefix, so the extraction is incrementally cheap.
Restricted tool policy — createAutoMemCanUseTool() at extractMemories.ts:171-222
- Allowed:
FileRead,Grep,Glob, read-only Bash (ls,find,cat,stat,wc,head,tail),FileEdit/FileWriteonly for paths inside the memory directory. - Denied:
rm, write-capable Bash,Agent,MCP,Skill, anything else.
The prompt pre-announces this policy so the model doesn't try denied tools.
Budgets
maxTurns: 5(line 426) — prevents the extraction from becoming a verification rabbit hole.skipTranscript: true(line 423) — don't write the fork's own output into the main transcript (avoids race with main thread).skipCacheWrite: true— don't push post-boundary content into the parent's cache.
Lifecycle
Fire-and-forget from the stop hook (line 598). The shutdown path awaits via drainPendingExtraction() (lines 611-615) so in-flight extractions complete before the process exits.
15.4 Memory loading (recall) — /workspaces/src/memdir/findRelevantMemories.ts
The scan
scanMemoryFiles() reads the memory directory, parses the first 30 lines of each file (enough for frontmatter), and returns file headers sorted newest-first (by mtime), capped at 200 files max. MEMORY.md itself is excluded (already loaded separately in the prompt).
The selection call
A Sonnet sideQuery (not Haiku — precision matters here) is invoked with:
- System prompt: "select up to 5 memories that will clearly be useful; only include if certain; skip tool reference docs when that tool is recently used."
- User message:
Query: ${currentUserMessage}
Available memories:
- [user] user_role.md (2026-03-14): deep Go expertise, new to React side of this repo
- [feedback] feedback_testing.md (2026-02-01): integration tests must hit a real database
- [project] project_merge_freeze.md (2026-03-05): merge freeze begins 2026-03-05 for mobile cut
- ...
Tools recently used: FileRead, Grep
Filtering
- Excludes anything in
alreadySurfaced(memories already loaded this turn). - If
recentToolsis provided, deprioritizes memories whose descriptions reference those tools (reduces noise — if we just used Grep, we don't need Grep-reference memory).
Budgets
max_tokens: 256on the Sonnet response (JSON is short).- Up to 5 memories selected.
- Each pick is cross-checked against the filename set to reject hallucinated matches.
Composition
Selected memory files are read in full and injected into the system prompt alongside MEMORY.md. This is the "dynamic memory block" that sits post-SYSTEM_PROMPT_DYNAMIC_BOUNDARY.
15.5 MEMORY.md — format and discipline
Schema
Pure markdown, no frontmatter. One line per entry:
- [Title](filename.md) — one-line hook
- [Another](topic.md) — hookThe parser just iterates lines; it doesn't validate structure strictly. The prompt (quoted in doc 7) tells the model to keep lines under ~150 chars and move detail to topic files.
Truncation — /workspaces/src/memdir/memdir.ts:57-103
- 200-line cap (natural boundary).
- 25 KB byte cap (catches long-line indexes that slip past line cap — observed p100 was 197 KB under 200 lines).
- Truncation line-first, then byte-truncate at the last newline before the cap.
- Warning appended when truncated:
> WARNING: MEMORY.md is [N lines/X bytes]. Only part of it was loaded. Keep index entries to one line under ~200 chars.
The warning is visible to the model so it can clean up the index on the next extraction.
15.6 Individual memory file — schema
---
name: memory name
description: one-line hook used for relevance selection
type: user | feedback | project | reference
---
Body text. For `feedback` / `project`:
Lead with the rule.
**Why:** the reason given.
**How to apply:** when / where this kicks in.Validation — /workspaces/src/memdir/memoryScan.ts:46-64
parseFrontmatter()extractsname,description,typefrom YAML.parseMemoryType()(memoryTypes.ts:28-31) rejects unknown types; missingtype:is allowed for legacy files.descriptionis used as-is for the manifest passed to Sonnet recall.nameis advisory — the model's own organization metadata.
Body structure is not enforced — it's guidance in the prompt. The Why: / How to apply: format is consistently nudged but legacy files pre-dating the convention still work.
15.7 Team memory — /workspaces/src/memdir/teamMemPaths.ts + /workspaces/src/services/teamMemorySync/
Location
~/.claude/projects/<slug>/memory/team/ — a sibling of the private memory directory (teamMemPaths.ts:85). Each has its own MEMORY.md.
Gate
isTeamMemoryEnabled() (teamMemPaths.ts:73-78) — requires both auto-memory enabled and the tengu_herring_clock feature flag.
Prompt mode
buildCombinedMemoryPrompt() (teamMemPrompts.ts:22-100):
## Memory scopesection explains private vs. team directories.- Each type block gets a
<scope>tag:user→ always private.feedback→ defaults private; team-OK for project-wide conventions.project→ strongly biases team.reference→ usually team.
- Save step is still two-part (write topic, add index line) but the model picks the right directory.
Security — path validation
validateTeamMemKey() and validateTeamMemWritePath() (teamMemPaths.ts:265-292) check for:
- symlink escapes (PSR M22186 — this is a named fix),
- null bytes,
- URL-encoded traversal sequences,
- Unicode normalization attacks.
These matter because the team directory is synced to a shared location; a path-traversal write would affect other users.
Sync
/workspaces/src/services/teamMemorySync/ — not live during a session. Team memory changes are synced at session start/end via a watcher. Private memory is always local.
15.8 Session memory — the parallel track — /workspaces/src/services/SessionMemory/
Distinct from auto-memory. Same filesystem area (~/.claude/session-memory/<session-id>/…) but different purpose.
Goal
A running summary of this session that can survive compaction — so when compaction fires, the model has a durable record of what was done.
Structure
- Single file per session, free-form markdown.
- Sections: Session Title, Current State, Task specification, Files and Functions, Workflow, Errors & Corrections, Codebase and System Documentation, Learnings, Key Results, Worklog.
- Section size caps: ~2000 tokens per section, ~12000 tokens total (enforced via prompt reminders when exceeded).
- No frontmatter — pure markdown structure.
Triggers — sessionMemory.ts:134-181
const shouldExtract =
(hasMetTokenThreshold && hasMetToolCallThreshold) ||
(hasMetTokenThreshold && !hasToolCallsInLastTurn)minimumMessageTokensToInit— token count needed before the first session memory is written.minimumTokensBetweenUpdate— token delta required before re-update.toolCallsBetweenUpdates— tool-call threshold (but only in conjunction with token threshold).- Safe extraction: the last assistant turn must have no tool calls — otherwise there'd be orphaned
tool_resultmessages in the forked transcript.
Prompt template
sessionMemory/prompts.ts:43-247 — template-based, customizable via ~/.claude/session-memory/config/prompt.md. The model is told to preserve structure (the sections above) and respect the token budgets.
Relationship to auto-memory
- Auto-memory: durable, multi-session, topic-keyed, four-type taxonomy, written via multi-turn fork with FileWrite/FileEdit.
- Session memory: session-scoped, single-file, free-form structured, written via a background fork that overwrites each time.
Both are extraction-driven; both use runForkedAgent(). The difference is persistence scope and file discipline.
15.9 Cache implications
Memory sits after the SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker (constants/prompts.ts:495-573). That placement is deliberate:
- If memory were before the boundary, every memory write would bust the cross-org global prompt cache (scope
'global') — tens of thousands of cache-creation tokens per user per write. - After the boundary, a memory change only invalidates the session-specific suffix. The expensive global prefix is still a hit.
loadMemoryPrompt() is called from a systemPromptSection('memory', …) registration (constants/prompts.ts:495) that is memoized for the session until /clear or /compact. A mid-session memory write does not immediately re-render the system prompt — the next turn uses the cached value. The new memory takes effect on the next /clear or /compact, or on a new session boot, unless the extraction explicitly invalidates the section cache (which it does not — extractions are always delayed-effect).
This is a subtle choice: immediate-effect memory writes would give faster behavior change but worse cache economy. The current design prefers cache economy because most memory writes are not urgent (they're summaries of just-completed work).
15.10 /clear and /compact effects
/clear
- Clears the in-memory conversation. Memory files are untouched.
- The memoized
systemPromptSection('memory', ...)is invalidated, so the next turn reloads memory from disk. Any recently-written memory files now appear.
/compact
- The session transcript is summarized and replaced.
- Session memory may be invoked via
trySessionMemoryCompaction()for a cleaner compaction summary. - Auto-memory files are not modified. The
loadMemoryPrompt()cache is invalidated → next turn reloads.
Neither command erases or mutates persistent memories. Memory is strictly additive (barring manual /memory management).
15.11 Telemetry and debugging
tengu_memory_extractevent fired per extraction with: tokens, turns used, private files written, team files written (if combined), was-skip reason.tengu_memory_recallevent fired per recall with: memories scanned, memories selected, Sonnet tokens.hasMemoryWritesSince→skip_reason: main_already_wroteis the canonical "not needed this turn" signal.
15.12 The mental model
- Memory is extracted, not maintained. You don't tell the agent "remember this." The stop-hook extraction reads what just happened and writes what's worth keeping.
- Memory is a claim, not a truth. Recall-side guidance tells the model to verify file paths / function names against current state before acting.
- Index vs. topic.
MEMORY.mdis a one-line-per-item catalog so the recall scorer can decide without loading every file; topic files carry the detail. - Fork-based extraction means writing memory is cheap — it shares the parent's cache, runs in the background, and costs only the delta.
- Cache-safe placement of memory in the prompt means writes don't blow up prompt costs.
- Scope discipline (private vs. team, auto vs. session) keeps different kinds of knowledge in their right homes.
Key files:
| Concern | File |
|---|---|
| Extraction prompts | services/extractMemories/prompts.ts:50-154 |
| Extraction orchestration | services/extractMemories/extractMemories.ts:296-616 |
| Fork agent | utils/forkedAgent.ts |
| Recall (relevance selection) | memdir/findRelevantMemories.ts:39-75 |
| Directory and truncation | memdir/memdir.ts:57-103 |
| Type taxonomy | memdir/memoryTypes.ts:14, 113-178 |
| Team memory scope | memdir/teamMemPaths.ts:73-292 |
| Session memory | services/SessionMemory/sessionMemory.ts:134-181 + prompts.ts:43-247 |
| Cache boundary | constants/prompts.ts:114-115, 495, 573 |