← Tours

Memory compression strategies — a deep dive

Engineers building long-running agents who need state to outlive a context window.

60 minutes · 7 stops

Tour — Memory compression deep dive

Long sessions break agents. The model loses focus around 60–80 turns even with a 200K context. This tour walks through how production systems push that ceiling without bankrupting their token budget.

Pacing

BlockTime
Concept · memory-compression15 min
Insight · preservation rules are the strategy5 min
OpenHands v1 condenser drill-down15 min
Strix memory_compressor drill-down10 min
Concept · prompt-caching10 min
Insight · latched sticky flags5 min

Output

Three concrete artifacts you should produce while reading:

  1. A strategy choice for your agent (sliding window / LLM-summarize / event-source / hybrid) defended with one sentence on the use case.
  2. A preservation list for your domain — five bullets, what the summarizer must keep verbatim. If you can’t write it confidently, you don’t yet have enough domain clarity to ship.
  3. A compaction-trigger plan that doesn’t trash your cache hit rate. Concrete: “compact at turn 50, 100, 150” or “compact when input exceeds 80K tokens.”

Common mistakes you’ll avoid

  • Generic “summarize this conversation” prompts.
  • Compaction every turn (cache death).
  • Forgetting to preserve task IDs / file paths / errors.
  • Treating the summary as opaque instead of structured.

For your specific projects

For Swisscheese, the preservation list is reviewer verdicts, diff hashes, file paths, and error text. Reviewer prose is the noise; structured verdicts are the signal.

For AI Act compliance, treat the audit log and the operating memory as separate concerns. Compress the operating memory; never compress the audit log.

Itinerary

  1. Memory compression concept

    The four strategies. Read first; the algorithm is the easy part.

  2. Memory compression preserves credentials, payloads, task IDs explicitly insight

    The non-obvious lesson: preservation rules matter more than algorithm choice. Generic 'summarize this' produces useless mush.

  3. OpenHands (v1) project

    The most rigorous condenser — strategies are pluggable, the summarizer prompt enumerates what to preserve.

  4. Strix project

    Domain-specific preservation in the security-agent context. Read the compressor file.

  5. Prompt caching concept

    Compression invalidates cache. The interaction is essential to understand together — get one wrong and your bill explodes.

  6. Latched sticky flags for cache coherence insight

    How to keep cache hot when the prompt would otherwise change.

  7. Hermes Agent project

    Hybrid strategy: first-and-last verbatim, middle summarized. A pragmatic middle.