Thinking signature preservation across turns (and stripping on model switch)

When Claude Code calls Anthropic with extended thinking enabled, the assistant response contains structured thinking content blocks alongside text/tool blocks. Each thinking block is HMAC-signed (signature field).

To preserve reasoning continuity across turns, Claude Code re-submits historical thinking blocks with their signatures in the next request. The model uses the prior thinking as context. If you strip the signatures (or the whole block), the model rethinks from scratch — costing more and producing worse results.

The non-obvious part: signatures are model-bound. If the user switches model mid-session (Sonnet → Opus, or vice versa), the previous-model signatures will be rejected by the API. Claude Code detects model changes and strips thinking blocks in that case to avoid the request failing.

Add to that: thinking blocks older than ~1 hour can be cleared anyway, on the theory that the agent has moved on and stale reasoning is more harmful than helpful.

Three things to remember

Always pass thinking + signature when re-sending history with the same model.
Drop thinking blocks the moment the model identifier changes mid-session.
Consider a TTL on thinking blocks (Claude Code uses 1h) so stale reasoning doesn’t anchor the agent.

Why it’s worth knowing

Most multi-provider abstractions (LiteLLM, hand-rolled adapters) get this wrong by default — they treat thinking as just another content block and either strip it on every turn (losing continuity) or pass it through unmodified across model switches (crashing). The cost of getting this right is small; the cost of getting it wrong is invisible until you upgrade to a thinking-enabled model.

Sources

claude-code/04-agent-loop-and-llm.md:75 ? unverified