Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%

Key Takeaways

▸Automatic prompt-caching implementation reduces token costs by up to 90% through intelligent detection and caching of stable content
▸Multi-strategy approach handles different content types including stack traces, code files, refactoring patterns, and conversation history
▸Open-source plugin with MIT license and zero lock-in is compatible with leading AI coding platforms

Source:

Hacker Newshttps://prompt-caching.ai/↗

Summary

A new open-source plugin for Anthropic's Claude API automatically implements prompt-caching to dramatically reduce token consumption and API costs. The plugin intelligently detects stable content in conversations—such as stack traces, code files, and refactoring patterns—and injects cache breakpoints to store this content server-side for five minutes, reducing cache read costs to just 10% of normal rates. The system learns from repeated file reads and conversation history, progressively compounding savings as interactions continue, with break-even cost savings occurring as early as the second turn in Claude Code sessions with Claude Sonnet.

The plugin is available as an open-source release under the MIT license and is compatible with multiple AI coding assistants including Claude Code, Cursor, Windsurf, ChatGPT, Perplexity, and other MCP-compatible clients. Installation is straightforward—single command for Claude Code or npm installation for other platforms—with no configuration files or restarts required. The tool includes cache statistics tracking and is awaiting official approval in the Claude Code plugin marketplace while available for immediate installation via GitHub.

Break-even cost savings achieved by second conversation turn, with compounding benefits in longer sessions

Editorial Opinion

This plugin addresses a genuine pain point for developers using Claude for coding tasks—the cumulative token costs of repetitive context in multi-turn conversations. The 90% savings claim is compelling, though real-world performance will depend on conversation patterns and content stability. The open-source approach and multi-platform compatibility enhance its value proposition, though the pending marketplace approval suggests the integration pathway may still be refining.

Anthropic

PRODUCT LAUNCH Anthropic2026-03-13

Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%

Key Takeaways

▸Automatic prompt-caching implementation reduces token costs by up to 90% through intelligent detection and caching of stable content
▸Multi-strategy approach handles different content types including stack traces, code files, refactoring patterns, and conversation history
▸Open-source plugin with MIT license and zero lock-in is compatible with leading AI coding platforms

Source:

Hacker Newshttps://prompt-caching.ai/↗

Summary

Break-even cost savings achieved by second conversation turn, with compounding benefits in longer sessions

Editorial Opinion

This plugin addresses a genuine pain point for developers using Claude for coding tasks—the cumulative token costs of repetitive context in multi-turn conversations. The 90% savings claim is compelling, though real-world performance will depend on conversation patterns and content stability. The open-source approach and multi-platform compatibility enhance its value proposition, though the pending marketplace approval suggests the integration pathway may still be refining.

Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains