BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-03-17

minRLM: Token-Efficient Recursive Language Models Achieve 3.6× Better Efficiency While Outperforming Vanilla LLMs

Key Takeaways

  • ▸minRLM achieves 3.6× token efficiency gains on GPT-4o mini and 30+ percentage point accuracy improvements over vanilla LLMs on larger models
  • ▸By storing data as REPL variables and having models write code to query it, attention only runs on filtered results rather than entire documents, avoiding context window rot
  • ▸Costs remain flat regardless of context size, making the approach viable for long-context tasks that would be prohibitively expensive with traditional LLMs
Source:
Hacker Newshttps://avilum.github.io/minrlm/recursive-language-model.html↗

Summary

minRLM, a new token and latency-efficient implementation of Recursive Language Models (RLMs), demonstrates significant improvements over vanilla LLM approaches and reference implementations. The system scores 72.7% on GPT-4o mini (compared to 69.7% official and 69.5% vanilla) while using 3.6× fewer tokens, and achieves even larger gains on larger models, winning 11 of 12 benchmark tasks against vanilla implementations. Rather than pasting large documents into the context window, minRLM stores input data as variables in a Python REPL, allowing the model to write code to query and filter data, with attention running only on the results.

The approach builds on a December 2025 proposal by Zhang, Kraska, and Khattab and extends their validation across 12 tasks and multiple model sizes. A key innovation is that costs remain roughly flat regardless of context size, as large documents (even 7M characters) become as accessible as much smaller ones (7K characters) through code-based navigation rather than wholesale reading. The implementation includes an open-source codebase with every intermediate step in readable, rerunnable Python code, enabling transparency and debugging.

  • The pattern aligns with production deployments like Anthropic's improved web search and emerging standards like Model Context Protocol (MCP) for standardizing code execution across AI providers

Editorial Opinion

minRLM represents a meaningful shift in how we should think about LLM efficiency: instead of throwing larger context windows and more tokens at retrieval and analytics problems, using the model as a code generator to query data through a Python sandbox is both cheaper and more accurate. The ~30pp accuracy gap on larger models is striking and suggests this approach deserves serious consideration in production systems. As context window rot becomes a recognized limitation of scaling context length, RLM-style patterns offer a practical alternative that's starting to appear in real-world products.

Large Language Models (LLMs)Natural Language Processing (NLP)AI AgentsMLOps & Infrastructure

More from OpenAI

OpenAIOpenAI
INDUSTRY REPORT

AI Chatbots Are Homogenizing College Classroom Discussions, Yale Students Report

2026-04-05
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Announces Executive Reshuffle: COO Lightcap Moves to Special Projects, Simo Takes Medical Leave

2026-04-04
OpenAIOpenAI
PARTNERSHIP

OpenAI Acquires TBPN Podcast to Control AI Narrative and Reach Influential Tech Audience

2026-04-04

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us