BotBeat
...
← Back

> ▌

DeepSeekDeepSeek
RESEARCHDeepSeek2026-04-04

DeepSeek Introduces R2R: Token Routing Method Combines Small and Large Models for Efficient Reasoning

Key Takeaways

  • ▸R2R identifies that most tokens between LLM and SLM outputs are identical or neutral, with only a small fraction driving reasoning divergence
  • ▸The token routing approach achieves 1.6x higher accuracy than R1-7B and 2.8x speedup versus R1-32B while using only 5.6B average activated parameters
  • ▸An automatic data generation pipeline identifies divergent tokens and generates routing labels to train the lightweight router
Source:
Hacker Newshttps://arxiv.org/abs/2505.21600↗

Summary

Researchers at DeepSeek have published a new paper introducing Roads to Rome (R2R), a neural token routing method that selectively routes reasoning tasks between small and large language models to achieve superior efficiency. The key insight driving R2R is that only a small fraction of tokens actually diverge in reasoning paths between Large Language Models (LLMs) and distilled Small Language Models (SLMs)—most tokens are either identical or exhibit neutral variations like minor abbreviations or phrasing differences. By intelligently routing only the critical, path-divergent tokens to the larger model while leaving the majority of token generation to the more efficient SLM, R2R achieves a dramatic improvement in the efficiency-performance tradeoff.

When applied to DeepSeek's R1 model family (combining R1-1.5B and R1-32B), R2R demonstrates impressive results on challenging math, coding, and QA benchmarks. With an average activated parameter size of just 5.6B, the system surpasses the average accuracy of the R1-7B model by 1.6x and outperforms even the R1-14B model. Most notably, compared to the full R1-32B model, R2R delivers a 2.8x wall-clock speedup while maintaining comparable performance. The researchers have released their code publicly, enabling broader adoption of the technique.

  • The method advances the Pareto frontier of test-time scaling efficiency, offering practical benefits for LLM deployment

Editorial Opinion

R2R represents an elegant solution to a critical problem in modern AI deployment: the tension between reasoning capability and computational efficiency. By recognizing that reasoning divergence is sparse rather than uniform across tokens, DeepSeek has developed a pragmatic approach that could significantly reduce the inference costs of deploying advanced reasoning models. This work suggests that future efficiency gains may come not from better model compression, but from smarter routing strategies that respect the heterogeneous computational demands of different reasoning steps.

Large Language Models (LLMs)Generative AIMachine LearningMLOps & Infrastructure

More from DeepSeek

DeepSeekDeepSeek
RESEARCH

DeepSeek V4 Pro and Flash Positioned Between Kimi and Claude in Independent Benchmark Test

2026-05-15
DeepSeekDeepSeek
INDUSTRY REPORT

China's AI Industry Operates Under State Direction as Government Backs DeepSeek with $50B Valuation

2026-05-11
DeepSeekDeepSeek
INDUSTRY REPORT

Two Years of Local AI on a Laptop: When Open Models Outpaced Moore's Law

2026-05-11

Comments

Suggested

AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
Generative AIGenerative AI
INDUSTRY REPORT

Barnes & Noble CEO Backs Selling AI-Written Books, Sparking Industry Debate on Transparency Standards

2026-05-20
Research CommunityResearch Community
RESEARCH

New Methodology Proposed for Selecting Runtime Architecture Patterns in Production LLM Agents

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us