DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning
Key Takeaways
- ▸R2R identifies that most tokens between LLM and SLM outputs are identical or neutral, with only a small fraction driving reasoning divergence
- ▸The token routing approach achieves 1.6x higher accuracy than R1-7B and 2.8x speedup versus R1-32B while using only 5.6B average activated parameters
- ▸An automatic data generation pipeline identifies divergent tokens and generates routing labels to train the lightweight router
Summary
Researchers at DeepSeek have published a new paper introducing Roads to Rome (R2R), a neural token routing method that selectively routes reasoning tasks between small and large language models to achieve superior efficiency. The key insight driving R2R is that only a small fraction of tokens actually diverge in reasoning paths between Large Language Models (LLMs) and distilled Small Language Models (SLMs)—most tokens are either identical or exhibit neutral variations like minor abbreviations or phrasing differences. By intelligently routing only the critical, path-divergent tokens to the larger model while leaving the majority of token generation to the more efficient SLM, R2R achieves a dramatic improvement in the efficiency-performance tradeoff.
When applied to DeepSeek's R1 model family (combining R1-1.5B and R1-32B), R2R demonstrates impressive results on challenging math, coding, and QA benchmarks. With an average activated parameter size of just 5.6B, the system surpasses the average accuracy of the R1-7B model by 1.6x and outperforms even the R1-14B model. Most notably, compared to the full R1-32B model, R2R delivers a 2.8x wall-clock speedup while maintaining comparable performance. The researchers have released their code publicly, enabling broader adoption of the technique.
- The method advances the Pareto frontier of test-time scaling efficiency, offering practical benefits for LLM deployment
Editorial Opinion
R2R represents an elegant solution to a critical problem in modern AI deployment: the tension between reasoning capability and computational efficiency. By recognizing that reasoning divergence is sparse rather than uniform across tokens, DeepSeek has developed a pragmatic approach that could significantly reduce the inference costs of deploying advanced reasoning models. This work suggests that future efficiency gains may come not from better model compression, but from smarter routing strategies that respect the heterogeneous computational demands of different reasoning steps.



