Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%
Key Takeaways
- ▸Automatic prompt-caching implementation reduces token costs by up to 90% through intelligent detection and caching of stable content
- ▸Multi-strategy approach handles different content types including stack traces, code files, refactoring patterns, and conversation history
- ▸Open-source plugin with MIT license and zero lock-in is compatible with leading AI coding platforms
Summary
A new open-source plugin for Anthropic's Claude API automatically implements prompt-caching to dramatically reduce token consumption and API costs. The plugin intelligently detects stable content in conversations—such as stack traces, code files, and refactoring patterns—and injects cache breakpoints to store this content server-side for five minutes, reducing cache read costs to just 10% of normal rates. The system learns from repeated file reads and conversation history, progressively compounding savings as interactions continue, with break-even cost savings occurring as early as the second turn in Claude Code sessions with Claude Sonnet.
The plugin is available as an open-source release under the MIT license and is compatible with multiple AI coding assistants including Claude Code, Cursor, Windsurf, ChatGPT, Perplexity, and other MCP-compatible clients. Installation is straightforward—single command for Claude Code or npm installation for other platforms—with no configuration files or restarts required. The tool includes cache statistics tracking and is awaiting official approval in the Claude Code plugin marketplace while available for immediate installation via GitHub.
- Break-even cost savings achieved by second conversation turn, with compounding benefits in longer sessions
Editorial Opinion
This plugin addresses a genuine pain point for developers using Claude for coding tasks—the cumulative token costs of repetitive context in multi-turn conversations. The 90% savings claim is compelling, though real-world performance will depend on conversation patterns and content stability. The open-source approach and multi-platform compatibility enhance its value proposition, though the pending marketplace approval suggests the integration pathway may still be refining.


