BotBeat
...
← Back

> ▌

AnthropicAnthropic
PRODUCT LAUNCHAnthropic2026-03-13

Anthropic's Prompt-Caching Plugin Automatically Reduces Token Costs by 90%

Key Takeaways

  • ▸Automatic prompt-caching implementation reduces token costs by up to 90% through intelligent detection and caching of stable content
  • ▸Multi-strategy approach handles different content types including stack traces, code files, refactoring patterns, and conversation history
  • ▸Open-source plugin with MIT license and zero lock-in is compatible with leading AI coding platforms
Source:
Hacker Newshttps://prompt-caching.ai/↗

Summary

A new open-source plugin for Anthropic's Claude API automatically implements prompt-caching to dramatically reduce token consumption and API costs. The plugin intelligently detects stable content in conversations—such as stack traces, code files, and refactoring patterns—and injects cache breakpoints to store this content server-side for five minutes, reducing cache read costs to just 10% of normal rates. The system learns from repeated file reads and conversation history, progressively compounding savings as interactions continue, with break-even cost savings occurring as early as the second turn in Claude Code sessions with Claude Sonnet.

The plugin is available as an open-source release under the MIT license and is compatible with multiple AI coding assistants including Claude Code, Cursor, Windsurf, ChatGPT, Perplexity, and other MCP-compatible clients. Installation is straightforward—single command for Claude Code or npm installation for other platforms—with no configuration files or restarts required. The tool includes cache statistics tracking and is awaiting official approval in the Claude Code plugin marketplace while available for immediate installation via GitHub.

  • Break-even cost savings achieved by second conversation turn, with compounding benefits in longer sessions

Editorial Opinion

This plugin addresses a genuine pain point for developers using Claude for coding tasks—the cumulative token costs of repetitive context in multi-turn conversations. The 90% savings claim is compelling, though real-world performance will depend on conversation patterns and content stability. The open-source approach and multi-platform compatibility enhance its value proposition, though the pending marketplace approval suggests the integration pathway may still be refining.

Large Language Models (LLMs)AI AgentsMLOps & InfrastructureOpen Source

More from Anthropic

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Security Researcher Exposes Critical Infrastructure After Following Claude's Configuration Advice Without Authentication

2026-04-05

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us