Cache lifecycle

How Claude's prompt cache works across conversation turns, compaction events, context forks, and model switches. Companion to the cache-friendly authoring principle in core-principles.md.

Claude's API caches prompts via exact prefix matching. The cache key is a hash of the prompt content from the first byte forward. On each turn:

  1. The system prompt, tool definitions, and conversation history are assembled into a single prompt
  2. The API checks if any prefix of this prompt matches a cached entry
  3. Matched prefix tokens are read from cache at 10% of standard input pricing
  4. Unmatched tokens (the suffix) are processed at 125% of standard pricing (cache write)
  5. The new, longer prefix is cached for future turns

Cache TTL: Cached prefixes auto-refresh on each hit but expire after ~5 minutes of inactivity. Long pauses between turns (user thinking, waiting for approval) can cause cache expiry.

What stays stable across turns: System prompt, tool definitions, SKILL.md content, agent instructions, CLAUDE.md content. These form the cacheable prefix.

What changes each turn: New user messages, assistant responses, tool call results. These extend the suffix.

When a conversation approaches the context window limit, Claude Code compacts the history — summarizing prior turns into a condensed form and continuing in a fresh context.

Cache-safe compaction reuses the exact same prefix (system prompt + tools + skill instructions) and appends the compaction summary as a new user message. The prefix cache is preserved because the prefix bytes are identical.

Cache-breaking compaction would rebuild the prompt differently — reordering tools, changing system prompt content, or injecting the summary into the system prompt. This invalidates the entire cache.

Cache-safe — prefix preserved:

[system prompt]           ← identical to pre-compaction (CACHED)
[tool definitions]        ← identical to pre-compaction (CACHED)
[skill instructions]      ← identical to pre-compaction (CACHED)
[user]: "Session summary: Previously we implemented auth module,
         fixed 3 bugs, and updated tests. Continuing with..."
[assistant]: "I'll continue from where we left off..."
The entire prefix is a cache hit. Only the summary message and new responses are cache writes.

Cache-breaking — prefix changed:

[modified system prompt]  ← different content (CACHE MISS)
[tool definitions]        ← even if identical, miss propagates
[skill instructions]      ← miss propagates to all downstream tokens
[user]: "Continue working..."
A single byte change in the system prompt invalidates the cache for everything that follows, including the unchanged tool definitions and skill instructions.

Implication for orchestrators: Orchestrators that run long sessions (implementation-orchestrator, finalization-orchestrator) benefit most from cache-safe compaction. Keep system prompts and tool definitions identical across the session. Do not inject turn-count, elapsed-time, or progress-percentage into the system prompt.

The context: fork directive in SKILL.md spawns an isolated sub-session with its own context window. Key cache behaviors:

  • Isolated cache: The forked session builds its own cache from scratch. It does not inherit the parent's cached prefix.
  • Parent cache preserved: Forking does not modify the parent's prompt, so the parent's cache remains intact.
  • Short-lived: Fork sessions typically complete in a few turns, so their cache investment is small.
  • No cross-session sharing: Cache entries are per-session. Two forks of the same skill each pay their own cache write costs.

Practical guidance: Fork sessions are already cache-efficient by design — they start fresh, run briefly, and return results. The main concern is the parent session: receiving fork results adds new content to the conversation suffix, not the prefix, so the parent's cache is unaffected.

Prompt caches are per-model. Each model maintains its own cache namespace:

  • Switching from Opus to Haiku mid-conversation invalidates the Opus cache. The Haiku session starts with a cold cache.
  • Switching back to Opus may or may not hit the previous cache, depending on TTL expiry and whether the prompt prefix changed.

Recommendation: Use subagents (Task tool) instead of switching models mid-conversation. Subagents run in isolated sessions with their own caches, leaving the parent session's cache intact. This aligns with the model tier optimization approach (SPEC-017) where Haiku handles lightweight subtasks without disrupting the Opus parent cache.

- Cache hits cost 10% of input pricing; cache writes cost 125%. Preserving the cache across turns is a significant cost optimization. - Compaction is safe when the prompt prefix (system prompt + tools + instructions) is byte-identical before and after. - Fork sessions have isolated caches; they neither benefit from nor harm the parent cache. - Model switching invalidates the cache; prefer subagents for cross-model work. - Cache expires after ~5 minutes of inactivity; long pauses reset the cache.