Some flags are baked into Claude Code’s system prompt: fast-mode hint, AFK mode, debug mode. These appear in the cacheable prefix of the prompt. If a user toggles one mid-session, the prefix changes and the entire 50-70K-token cache is invalidated — the next turn pays full input cost.
Claude Code “latches” these flags: once a flag is set on, it stays on for the rest of the session, even if the user toggles it back off.
The trade explicitly
The user expects: “I turn off X, X is off.” The reality: “I turn off X, the prompt still says X is on.”
This is a UX trade for a cost trade. The team accepted it because the cache invalidation cost dwarfs the user-confusion cost — but it took a deliberate decision.
When you’d copy this
Any time:
- A user-controllable flag is in the cacheable region of a prompt.
- The cacheable region is large enough that invalidation hurts.
- The flag’s effect is additive (turning it off doesn’t need to be honored immediately).
When you wouldn’t
- A user-controllable flag is load-bearing for safety (e.g. “do not commit code”). You can’t latch this — it must be honored.
- The cache region is small enough that re-warming is cheap.
- User-confusion cost dominates (consumer product, not power-user tool).
Generalization
When designing a system that uses caching at scale, categorize every variable: cache-static, session-static, turn-dynamic. The first category is sacred — you don’t let users mutate it mid-session, even at the cost of features.
Most teams discover this only after a billing surprise. Doing the categorization up front saves the surprise.
Sources
-
claude-code/04-agent-loop-and-llm.md:1405? unverified