AutoTTS: Researchers Cut LLM Inference Tokens by 70% with AI-Discovered Reasoning Strategy
Key Takeaways
- ▸AutoTTS cuts inference tokens by approximately 70% compared to running 64 parallel chains while maintaining equivalent accuracy
- ▸The Confidence Momentum Controller (CMC) was written and refined by an AI agent, not manually designed by researchers
- ▸CMC uses real-time confidence signals and trends to dynamically decide when to branch, consolidate, explore, and prune reasoning paths
Summary
Researchers from Google, Meta, and academic institutions (UMD, UVA, WUSTL, UNC) have unveiled AutoTTS (Automated Test-Time Scaling), a groundbreaking technique that reduces token usage in large language model inference by approximately 70% while maintaining accuracy. Rather than using the standard brute-force approach of running 64 parallel reasoning chains and selecting the majority answer, AutoTTS employs an AI agent to discover optimal reasoning strategies automatically through iterative refinement.
The core innovation is the Confidence Momentum Controller (CMC), an inference policy that was not hand-designed by researchers but automatically discovered by an AI agent. Unlike fixed inference rules, the CMC dynamically watches the model's confidence across reasoning traces and makes real-time decisions about when to branch, when to consolidate, when to explore new paths, and when to prune unpromising reasoning chains. This adaptive approach significantly outperforms traditional parallel sampling while dramatically reducing computational costs.
The discovery methodology demonstrates remarkable efficiency: researchers pre-computed a 'replay store' of cached reasoning traces from thousands of problems, then allowed an AI agent to write controller code, test it against the cached traces, evaluate accuracy and token efficiency, and iteratively refine the policy—all without making new model calls. This offline discovery process cost only $39.90 in API calls and completed in 160 minutes. The discovered controller generalized effectively across different benchmarks (AIME24, AIME25, HMMT25) and model sizes, achieving 69.5% token reduction at β=0.5 (balanced accuracy-speed tradeoff) while matching the accuracy of SC@64.
- The entire discovery process cost only $39.90 in API calls thanks to offline evaluation against cached reasoning traces
- The discovered policy transferred across benchmarks and model sizes, indicating robustness and practical applicability
Editorial Opinion
AutoTTS represents a paradigm shift in inference optimization: instead of human researchers designing better reasoning strategies, we can build environments where AI agents discover them through systematic exploration. This work demonstrates that the most efficient reasoning strategies might fundamentally differ from those humans would design, and that dramatic cost-efficiency breakthroughs can emerge from algorithmic discovery rather than computational brute force. The fact that discovery itself was remarkably affordable ($40) while delivering 70% savings in continuous inference costs suggests AutoTTS could become a standard tool for making enterprise-scale LLM deployment economically viable. Most importantly, this research validates a meta-principle—using AI to improve AI—that could unlock innovations across the entire field.


