BotBeat
...
← Back

> ▌

DeepSeekDeepSeek
RESEARCHDeepSeek2026-04-04

DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning

Key Takeaways

  • ▸R2R identifies that most tokens between LLM and SLM outputs are identical or neutral, with only a small fraction driving reasoning divergence
  • ▸The token routing approach achieves 1.6x higher accuracy than R1-7B and 2.8x speedup versus R1-32B while using only 5.6B average activated parameters
  • ▸An automatic data generation pipeline identifies divergent tokens and generates routing labels to train the lightweight router
Source:
Hacker Newshttps://arxiv.org/abs/2505.21600↗

Summary

Researchers at DeepSeek have published a new paper introducing Roads to Rome (R2R), a neural token routing method that selectively routes reasoning tasks between small and large language models to achieve superior efficiency. The key insight driving R2R is that only a small fraction of tokens actually diverge in reasoning paths between Large Language Models (LLMs) and distilled Small Language Models (SLMs)—most tokens are either identical or exhibit neutral variations like minor abbreviations or phrasing differences. By intelligently routing only the critical, path-divergent tokens to the larger model while leaving the majority of token generation to the more efficient SLM, R2R achieves a dramatic improvement in the efficiency-performance tradeoff.

When applied to DeepSeek's R1 model family (combining R1-1.5B and R1-32B), R2R demonstrates impressive results on challenging math, coding, and QA benchmarks. With an average activated parameter size of just 5.6B, the system surpasses the average accuracy of the R1-7B model by 1.6x and outperforms even the R1-14B model. Most notably, compared to the full R1-32B model, R2R delivers a 2.8x wall-clock speedup while maintaining comparable performance. The researchers have released their code publicly, enabling broader adoption of the technique.

  • The method advances the Pareto frontier of test-time scaling efficiency, offering practical benefits for LLM deployment

Editorial Opinion

R2R represents an elegant solution to a critical problem in modern AI deployment: the tension between reasoning capability and computational efficiency. By recognizing that reasoning divergence is sparse rather than uniform across tokens, DeepSeek has developed a pragmatic approach that could significantly reduce the inference costs of deploying advanced reasoning models. This work suggests that future efficiency gains may come not from better model compression, but from smarter routing strategies that respect the heterogeneous computational demands of different reasoning steps.

Large Language Models (LLMs)Generative AIMachine LearningMLOps & Infrastructure

More from DeepSeek

DeepSeekDeepSeek
RESEARCH

Huawei's Ascend Chips Successfully Enable DeepSeek-V4-Pro Post-Training, Advancing China's AI Self-Reliance

2026-06-19
DeepSeekDeepSeek
INDUSTRY REPORT

Open-Source AI Dramatically Narrows Capability Gap: From 10 Months Behind to Just 2-3.5 Months

2026-06-18
DeepSeekDeepSeek
RESEARCH

DeepSeek Completes Full-Parameter Post-Training of V4-Pro on Huawei's Ascend 910C Chips

2026-06-17

Comments

Suggested

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
Google / AlphabetGoogle / Alphabet
RESEARCH

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

2026-07-04
LLM Agent EcosystemLLM Agent Ecosystem
RESEARCH

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us