minRLM: Token-Efficient Recursive Language Models Achieve 3.6× Better Efficiency While Outperforming Vanilla LLMs

Key Takeaways

▸minRLM achieves 3.6× token efficiency gains on GPT-4o mini and 30+ percentage point accuracy improvements over vanilla LLMs on larger models
▸By storing data as REPL variables and having models write code to query it, attention only runs on filtered results rather than entire documents, avoiding context window rot
▸Costs remain flat regardless of context size, making the approach viable for long-context tasks that would be prohibitively expensive with traditional LLMs

Source:

Hacker Newshttps://avilum.github.io/minrlm/recursive-language-model.html↗

Summary

minRLM, a new token and latency-efficient implementation of Recursive Language Models (RLMs), demonstrates significant improvements over vanilla LLM approaches and reference implementations. The system scores 72.7% on GPT-4o mini (compared to 69.7% official and 69.5% vanilla) while using 3.6× fewer tokens, and achieves even larger gains on larger models, winning 11 of 12 benchmark tasks against vanilla implementations. Rather than pasting large documents into the context window, minRLM stores input data as variables in a Python REPL, allowing the model to write code to query and filter data, with attention running only on the results.

The approach builds on a December 2025 proposal by Zhang, Kraska, and Khattab and extends their validation across 12 tasks and multiple model sizes. A key innovation is that costs remain roughly flat regardless of context size, as large documents (even 7M characters) become as accessible as much smaller ones (7K characters) through code-based navigation rather than wholesale reading. The implementation includes an open-source codebase with every intermediate step in readable, rerunnable Python code, enabling transparency and debugging.

The pattern aligns with production deployments like Anthropic's improved web search and emerging standards like Model Context Protocol (MCP) for standardizing code execution across AI providers

Editorial Opinion

minRLM represents a meaningful shift in how we should think about LLM efficiency: instead of throwing larger context windows and more tokens at retrieval and analytics problems, using the model as a code generator to query data through a Python sandbox is both cheaper and more accurate. The ~30pp accuracy gap on larger models is striking and suggests this approach deserves serious consideration in production systems. As context window rot becomes a recognized limitation of scaling context length, RLM-style patterns offer a practical alternative that's starting to appear in real-world products.

minRLM: Token-Efficient Recursive Language Models Achieve 3.6× Better Efficiency While Outperforming Vanilla LLMs

Key Takeaways

▸minRLM achieves 3.6× token efficiency gains on GPT-4o mini and 30+ percentage point accuracy improvements over vanilla LLMs on larger models
▸By storing data as REPL variables and having models write code to query it, attention only runs on filtered results rather than entire documents, avoiding context window rot
▸Costs remain flat regardless of context size, making the approach viable for long-context tasks that would be prohibitively expensive with traditional LLMs

Summary

The pattern aligns with production deployments like Anthropic's improved web search and emerging standards like Model Context Protocol (MCP) for standardizing code execution across AI providers

Editorial Opinion

minRLM represents a meaningful shift in how we should think about LLM efficiency: instead of throwing larger context windows and more tokens at retrieval and analytics problems, using the model as a code generator to query data through a Python sandbox is both cheaper and more accurate. The ~30pp accuracy gap on larger models is striking and suggests this approach deserves serious consideration in production systems. As context window rot becomes a recognized limitation of scaling context length, RLM-style patterns offer a practical alternative that's starting to appear in real-world products.

minRLM: Token-Efficient Recursive Language Models Achieve 3.6× Better Efficiency While Outperforming Vanilla LLMs

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

AI Boom Decimates Entry-Level Programming Jobs While Senior Roles Thrive

Study Reveals LLMs Cannot Incorporate Evidence in Scientific Reasoning

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

minRLM: Token-Efficient Recursive Language Models Achieve 3.6× Better Efficiency While Outperforming Vanilla LLMs

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

AI Boom Decimates Entry-Level Programming Jobs While Senior Roles Thrive

Study Reveals LLMs Cannot Incorporate Evidence in Scientific Reasoning

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains