← Insights

Latched sticky flags for cache coherence

User-toggleable flags that live in the system prompt would bust 50-70K cache tokens on every toggle. Latching trades UX flexibility for a 10× cost cut.

Claude Code difficulty 3/3 cacheux-tradeoffscaling prompt-caching

Some flags are baked into Claude Code’s system prompt: fast-mode hint, AFK mode, debug mode. These appear in the cacheable prefix of the prompt. If a user toggles one mid-session, the prefix changes and the entire 50-70K-token cache is invalidated — the next turn pays full input cost.

Claude Code “latches” these flags: once a flag is set on, it stays on for the rest of the session, even if the user toggles it back off.

The trade explicitly

The user expects: “I turn off X, X is off.” The reality: “I turn off X, the prompt still says X is on.”

This is a UX trade for a cost trade. The team accepted it because the cache invalidation cost dwarfs the user-confusion cost — but it took a deliberate decision.

When you’d copy this

Any time:

  • A user-controllable flag is in the cacheable region of a prompt.
  • The cacheable region is large enough that invalidation hurts.
  • The flag’s effect is additive (turning it off doesn’t need to be honored immediately).

When you wouldn’t

  • A user-controllable flag is load-bearing for safety (e.g. “do not commit code”). You can’t latch this — it must be honored.
  • The cache region is small enough that re-warming is cheap.
  • User-confusion cost dominates (consumer product, not power-user tool).

Generalization

When designing a system that uses caching at scale, categorize every variable: cache-static, session-static, turn-dynamic. The first category is sacred — you don’t let users mutate it mid-session, even at the cost of features.

Most teams discover this only after a billing surprise. Doing the categorization up front saves the surprise.

Sources

  • claude-code/04-agent-loop-and-llm.md:1405 ? unverified