Tokdiet: New LLM Proxy Cuts API Costs 71% While Maintaining Quality Parity
Key Takeaways
- ▸71% token reduction (5.07M → 1.46M) with 95-97% quality parity proven through A/B testing on 66 real tasks across two models
- ▸Cache-aware design preserves prompt caching optimization benefits, avoiding the cache invalidation problem that affects naive context optimization
- ▸Available as Claude Code plugin with simple setup (npx tokdiet start) and transparent proxy routing without requiring code changes
Summary
agiwhitelist has launched Tokdiet, a local proxy that optimizes context for LLM API requests and reduces token consumption by approximately 71% without sacrificing output quality. The tool sits between AI agents and model APIs, intelligently compacting context while preserving information relevance. The creator demonstrated this through rigorous A/B testing: across 66 real-world tasks on MiniMax models, Tokdiet reduced input tokens from 5.07M to 1.46M while maintaining 95-97% quality parity compared to baseline full-context runs.
Unlike existing context optimization tools that achieve cost savings through blind pruning, Tokdiet is designed with awareness of modern LLM optimizations like prompt caching, ensuring it doesn't invalidate existing cache benefits. The tool also implements a fail-open architecture, reverting to transparent passthrough if any internal error occurs, guaranteeing it will never break production requests. Tokdiet is available as a Claude Code plugin and can be deployed locally via npx, supporting Claude, OpenAI, and other LLM providers.
- Fail-open safety architecture ensures the proxy never breaks production requests, automatically falling back to passthrough on error
- Security-first design keeps API keys local and never logs credentials to disk
Editorial Opinion
Tokdiet addresses a real and growing pain point in LLM application development—token costs—with an unusual level of transparency about the cost-quality tradeoff. Most token optimization tools either hide their impact on output quality or achieve savings through blind pruning that risks degrading model performance. By publicly releasing A/B test results showing 71% cost reduction with measurable quality parity, Tokdiet sets a new standard for responsible cost optimization. For developers running Claude, OpenAI, or other models at scale, this cache-aware approach could meaningfully impact infrastructure costs without the typical quality penalties.



