BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-03-03

Analysis Reveals Reinforcement Learning Scaling Requires 10,000x More Compute Than Inference Scaling for Same Performance Gains

Key Takeaways

  • ▸RL training compute scales half as efficiently as inference compute for performance gains—requiring 10,000x more compute for the same improvement that 100x inference scaling achieves
  • ▸Most performance gains in OpenAI's o1 model came from enabling longer chain-of-thought reasoning (inference-scaling) rather than from RL training itself
  • ▸Deployment costs multiply directly with inference compute requirements (30x longer thinking time = 30x higher costs per query), creating significant economic pressure
Source:
Hacker Newshttps://www.tobyord.com/writing/how-well-does-rl-scale↗

Summary

Philosopher and AI researcher Toby Ord has published a detailed analysis examining how reinforcement learning (RL) scales in modern AI systems, with significant implications for the cost and development of reasoning models. His analysis of OpenAI's o1 model charts reveals that RL-scaling—increasing compute during training—has approximately half the slope of inference-scaling on logarithmic axes. This mathematical relationship means that achieving the same performance improvement through RL training requires 100 times more compute than achieving it through longer inference times (chain-of-thought reasoning).

The analysis highlights that in OpenAI's initial o1 release, most performance gains came from unlocking inference-scaling capabilities rather than the RL training itself. While RL training provided a modest boost and enabled the model to use 30x longer chains of thought productively, the extended inference time contributed the larger performance improvement. This finding has major cost implications: if headline performance requires 30x more inference compute, deployment costs multiply by the same factor—expenses that must be paid with every model use and cannot be amortized through volume.

Ord's analysis demonstrates that across multiple benchmarks (AIME, ARC-AGI) and models (OpenAI's o1, Anthropic's Sonnet 3.7), a consistent pattern emerges: 100x inference-scaling typically drives performance from 20% to 80% accuracy. However, achieving the same improvement through RL-scaling would require 10,000x more training compute (100 squared, due to the half-slope relationship). This stark difference suggests fundamental limits to how efficiently RL can improve reasoning capabilities compared to simply allowing models more time to think.

  • The pattern holds consistently across multiple models and benchmarks: 100x inference-scaling typically improves performance from 20% to 80% accuracy
  • These scaling dynamics suggest inference-time compute may be more cost-effective for capability improvements than additional RL training

Editorial Opinion

This analysis reveals a potentially critical constraint on the RL-scaling paradigm that has dominated recent AI development. If RL training truly requires 100x more compute than inference to achieve equivalent performance gains, the economic calculus of AI development shifts dramatically—favoring architectures optimized for inference efficiency over ever-larger training runs. The finding also raises questions about whether we're approaching fundamental limits in how much reasoning ability can be "baked in" through training versus unlocked at inference time, with profound implications for both AI safety (can we align reasoning that emerges at inference?) and business models (recurring inference costs vs. one-time training investments).

Large Language Models (LLMs)Reinforcement LearningAI AgentsMachine LearningMarket Trends

More from OpenAI

OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

2026-05-20
OpenAIOpenAI
RESEARCH

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

2026-05-20
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Prepares to File to Go Public in Coming Weeks

2026-05-20

Comments

Suggested

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us