DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning

Key Takeaways

▸R2R identifies that most tokens between LLM and SLM outputs are identical or neutral, with only a small fraction driving reasoning divergence
▸The token routing approach achieves 1.6x higher accuracy than R1-7B and 2.8x speedup versus R1-32B while using only 5.6B average activated parameters
▸An automatic data generation pipeline identifies divergent tokens and generates routing labels to train the lightweight router

Source:

Hacker Newshttps://arxiv.org/abs/2505.21600↗

Summary

Researchers at DeepSeek have published a new paper introducing Roads to Rome (R2R), a neural token routing method that selectively routes reasoning tasks between small and large language models to achieve superior efficiency. The key insight driving R2R is that only a small fraction of tokens actually diverge in reasoning paths between Large Language Models (LLMs) and distilled Small Language Models (SLMs)—most tokens are either identical or exhibit neutral variations like minor abbreviations or phrasing differences. By intelligently routing only the critical, path-divergent tokens to the larger model while leaving the majority of token generation to the more efficient SLM, R2R achieves a dramatic improvement in the efficiency-performance tradeoff.

When applied to DeepSeek's R1 model family (combining R1-1.5B and R1-32B), R2R demonstrates impressive results on challenging math, coding, and QA benchmarks. With an average activated parameter size of just 5.6B, the system surpasses the average accuracy of the R1-7B model by 1.6x and outperforms even the R1-14B model. Most notably, compared to the full R1-32B model, R2R delivers a 2.8x wall-clock speedup while maintaining comparable performance. The researchers have released their code publicly, enabling broader adoption of the technique.

The method advances the Pareto frontier of test-time scaling efficiency, offering practical benefits for LLM deployment

Editorial Opinion

R2R represents an elegant solution to a critical problem in modern AI deployment: the tension between reasoning capability and computational efficiency. By recognizing that reasoning divergence is sparse rather than uniform across tokens, DeepSeek has developed a pragmatic approach that could significantly reduce the inference costs of deploying advanced reasoning models. This work suggests that future efficiency gains may come not from better model compression, but from smarter routing strategies that respect the heterogeneous computational demands of different reasoning steps.

DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning

Key Takeaways

▸R2R identifies that most tokens between LLM and SLM outputs are identical or neutral, with only a small fraction driving reasoning divergence
▸The token routing approach achieves 1.6x higher accuracy than R1-7B and 2.8x speedup versus R1-32B while using only 5.6B average activated parameters
▸An automatic data generation pipeline identifies divergent tokens and generates routing labels to train the lightweight router

Summary

The method advances the Pareto frontier of test-time scaling efficiency, offering practical benefits for LLM deployment

Editorial Opinion

R2R represents an elegant solution to a critical problem in modern AI deployment: the tension between reasoning capability and computational efficiency. By recognizing that reasoning divergence is sparse rather than uniform across tokens, DeepSeek has developed a pragmatic approach that could significantly reduce the inference costs of deploying advanced reasoning models. This work suggests that future efficiency gains may come not from better model compression, but from smarter routing strategies that respect the heterogeneous computational demands of different reasoning steps.

DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning

Key Takeaways

Summary

Editorial Opinion

More from DeepSeek

Huawei's Ascend Chips Successfully Enable DeepSeek-V4-Pro Post-Training, Advancing China's AI Self-Reliance

Open-Source AI Dramatically Narrows Capability Gap: From 10 Months Behind to Just 2-3.5 Months

DeepSeek Completes Full-Parameter Post-Training of V4-Pro on Huawei's Ascend 910C Chips

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning

Key Takeaways

Summary

Editorial Opinion

More from DeepSeek

Huawei's Ascend Chips Successfully Enable DeepSeek-V4-Pro Post-Training, Advancing China's AI Self-Reliance

Open-Source AI Dramatically Narrows Capability Gap: From 10 Months Behind to Just 2-3.5 Months

DeepSeek Completes Full-Parameter Post-Training of V4-Pro on Huawei's Ascend 910C Chips

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains