BotBeat
...
← Back

> ▌

AnthropicAnthropic
PRODUCT LAUNCHAnthropic2026-03-13

Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%

Key Takeaways

  • ▸Automatic prompt-caching implementation reduces token costs by up to 90% through intelligent detection and caching of stable content
  • ▸Multi-strategy approach handles different content types including stack traces, code files, refactoring patterns, and conversation history
  • ▸Open-source plugin with MIT license and zero lock-in is compatible with leading AI coding platforms
Source:
Hacker Newshttps://prompt-caching.ai/↗

Summary

A new open-source plugin for Anthropic's Claude API automatically implements prompt-caching to dramatically reduce token consumption and API costs. The plugin intelligently detects stable content in conversations—such as stack traces, code files, and refactoring patterns—and injects cache breakpoints to store this content server-side for five minutes, reducing cache read costs to just 10% of normal rates. The system learns from repeated file reads and conversation history, progressively compounding savings as interactions continue, with break-even cost savings occurring as early as the second turn in Claude Code sessions with Claude Sonnet.

The plugin is available as an open-source release under the MIT license and is compatible with multiple AI coding assistants including Claude Code, Cursor, Windsurf, ChatGPT, Perplexity, and other MCP-compatible clients. Installation is straightforward—single command for Claude Code or npm installation for other platforms—with no configuration files or restarts required. The tool includes cache statistics tracking and is awaiting official approval in the Claude Code plugin marketplace while available for immediate installation via GitHub.

  • Break-even cost savings achieved by second conversation turn, with compounding benefits in longer sessions

Editorial Opinion

This plugin addresses a genuine pain point for developers using Claude for coding tasks—the cumulative token costs of repetitive context in multi-turn conversations. The 90% savings claim is compelling, though real-world performance will depend on conversation patterns and content stability. The open-source approach and multi-platform compatibility enhance its value proposition, though the pending marketplace approval suggests the integration pathway may still be refining.

Large Language Models (LLMs)AI AgentsMLOps & InfrastructureOpen Source

More from Anthropic

AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
AnthropicAnthropic
POLICY & REGULATION

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

2026-05-20
AnthropicAnthropic
RESEARCH

Anthropic Claude Code Sandbox Bypass: Second Vulnerability Exposes Critical Data Exfiltration Risk

2026-05-20

Comments

Suggested

AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
Research CommunityResearch Community
RESEARCH

New Methodology Proposed for Selecting Runtime Architecture Patterns in Production LLM Agents

2026-05-20
Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us