BotBeat
...
← Back

> ▌

ExabaseExabase
RESEARCHExabase2026-05-15

Exabase Achieves State-of-the-Art on Memory Benchmark Using Smaller, Cheaper Models

Key Takeaways

  • ▸Achieved state-of-the-art on LongMemEval benchmark with 96.4% accuracy at top-50 using Gemini 3 Flash, not a frontier model
  • ▸Demonstrated that superior memory performance doesn't require expensive, oversized models—challenging the industry's scale-first approach
  • ▸Published transparent methodology without question-specific prompt tuning, setting a standard for production-realistic evaluation in long-term memory research
Source:
Hacker Newshttps://exabase.io/research/exabase-achieves-state-of-the-art-on-longmemeval-benchmark↗

Summary

Exabase announced Mneme-1 (M-1), its first-generation long-term memory engine, achieving state-of-the-art results on LongMemEval, the most comprehensive benchmark for conversational memory retrieval. The system reached 96.4% accuracy at top-50 recall depth using Gemini 3 Flash, a smaller and cheaper model, without question-specific prompt engineering or large frontier models.

The breakthrough addresses a critical gap in AI systems, where long-term memory has been both poorly evaluated and rarely tested under production-realistic conditions. Long-term memory in AI systems mirrors human memory—reconstructive, associative, and temporally sensitive rather than simple database lookups. This capability is essential for building AI systems that can maintain meaningful context across conversations and sessions.

Exabase's methodology emphasizes transparent, reproducible evaluation and acknowledges inherent ceiling effects in the benchmark itself. By refusing to rely on frontier models or prompt engineering tricks, the team demonstrates that progress on memory doesn't require brute-force scaling. This work sets a new standard for responsible evaluation in the long-term memory research space.

Editorial Opinion

Long-term memory remains one of the least solved problems in production AI, so Exabase's state-of-the-art results on a rigorous public benchmark are genuinely valuable. What makes this work stand out is the insistence on production-realistic conditions—no frontier models, no prompt engineering—proving that better memory systems don't require brute-force scaling. This is the kind of unglamorous but essential research the field needs.

AI AgentsMachine LearningAI Safety & Alignment

Comments

Suggested

Moebius Research ProjectMoebius Research Project
RESEARCH

Moebius: Lightweight Image Inpainting Framework Achieves 10B-Level Quality with Just 0.2B Parameters

2026-06-20
InceptionInception
PRODUCT LAUNCH

Inception Unveils Mercury 2: Parallel-Token Diffusion Models Reshape LLM Performance Economics

2026-06-20
UC Davis HealthUC Davis Health
RESEARCH

Brain-Computer Interface Enables Independent At-Home Communication for Man with ALS

2026-06-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us