Opus 4.7's New Tokenizer Increases Token Costs by 32-45%, But Caching Softens the Blow
Key Takeaways
- ▸Opus 4.7's new tokenizer inflates token counts by 32-45% compared to Opus 4.6, but pricing per token remains unchanged
- ▸Actual user costs increased 12-27% for typical workflows, though short-prompt users saw improvements due to more concise completions
- ▸Prompt caching absorbs the majority of tokenizer inflation; at 128K+ prompt lengths, 93% of extra tokens are cached and billed at 90% discount
Summary
Anthropic announced that Claude Opus 4.7 features an improved tokenizer designed to enhance the model's understanding of inputs. While Anthropic has kept pricing unchanged at $5/M input tokens and $25/M output tokens, the new tokenizer produces significantly more tokens for the same text—a 32-45% inflation depending on prompt size, with smaller prompts seeing the largest token count increases.
According to analysis by OpenRouter of over one million requests from users who switched from Opus 4.6 to 4.7, real-world costs increased 12-27% for typical usage patterns, with the notable exception of short prompts (under 2K tokens), which became more cost-efficient. The tokenizer inflation ranged from 1.0-1.35x depending on content type, as Anthropic previously disclosed.
Prompt caching significantly mitigates the cost impact, absorbing the majority of token inflation for longer prompts. For production-scale requests with 25K+ tokens, most of the extra tokens from the new tokenizer are cached and billed at a 90% discount. Additionally, Opus 4.7 generates notably shorter completions for brief queries (62% fewer tokens for prompts under 2K) while producing 13-30% longer responses for extended context prompts.
- Completion length behavior diverges by prompt size: 62% shorter for sub-2K queries, but 13-30% longer for 10K+ token prompts



