BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-03-17

minRLM: Token-Efficient Recursive Language Models Achieve 3.6× Better Efficiency While Outperforming Vanilla LLMs

Key Takeaways

  • ▸minRLM achieves 3.6× token efficiency gains on GPT-4o mini and 30+ percentage point accuracy improvements over vanilla LLMs on larger models
  • ▸By storing data as REPL variables and having models write code to query it, attention only runs on filtered results rather than entire documents, avoiding context window rot
  • ▸Costs remain flat regardless of context size, making the approach viable for long-context tasks that would be prohibitively expensive with traditional LLMs
Source:
Hacker Newshttps://avilum.github.io/minrlm/recursive-language-model.html↗

Summary

minRLM, a new token and latency-efficient implementation of Recursive Language Models (RLMs), demonstrates significant improvements over vanilla LLM approaches and reference implementations. The system scores 72.7% on GPT-4o mini (compared to 69.7% official and 69.5% vanilla) while using 3.6× fewer tokens, and achieves even larger gains on larger models, winning 11 of 12 benchmark tasks against vanilla implementations. Rather than pasting large documents into the context window, minRLM stores input data as variables in a Python REPL, allowing the model to write code to query and filter data, with attention running only on the results.

The approach builds on a December 2025 proposal by Zhang, Kraska, and Khattab and extends their validation across 12 tasks and multiple model sizes. A key innovation is that costs remain roughly flat regardless of context size, as large documents (even 7M characters) become as accessible as much smaller ones (7K characters) through code-based navigation rather than wholesale reading. The implementation includes an open-source codebase with every intermediate step in readable, rerunnable Python code, enabling transparency and debugging.

  • The pattern aligns with production deployments like Anthropic's improved web search and emerging standards like Model Context Protocol (MCP) for standardizing code execution across AI providers

Editorial Opinion

minRLM represents a meaningful shift in how we should think about LLM efficiency: instead of throwing larger context windows and more tokens at retrieval and analytics problems, using the model as a code generator to query data through a Python sandbox is both cheaper and more accurate. The ~30pp accuracy gap on larger models is striking and suggests this approach deserves serious consideration in production systems. As context window rot becomes a recognized limitation of scaling context length, RLM-style patterns offer a practical alternative that's starting to appear in real-world products.

Large Language Models (LLMs)Natural Language Processing (NLP)AI AgentsMLOps & Infrastructure

More from OpenAI

OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

2026-05-20
OpenAIOpenAI
RESEARCH

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

2026-05-20
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Prepares to File to Go Public in Coming Weeks

2026-05-20

Comments

Suggested

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
OpenAIOpenAI
RESEARCH

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us