BotBeat
...
← Back

> ▌

tilde-researchtilde-research
RESEARCHtilde-research2026-05-09

Aurora Optimizer Achieves 100x Data Efficiency in LLM Training, Surpasses Muon and NorMuon

Key Takeaways

  • ▸Aurora solves the neuron death problem in Muon optimizer that occurs on tall matrices by enforcing row-norm uniformity while preserving orthogonal gradient updates
  • ▸A 1.1B model trained with Aurora achieves 100x data efficiency on open-source internet data and outperforms larger models on general evaluation benchmarks
  • ▸Aurora achieves state-of-the-art results on the modded-nanoGPT speedrun with only 6% computational overhead, making it a practical and efficient drop-in replacement for Muon
Source:
Hacker Newshttps://blog.tilderesearch.com/blog/aurora↗

Summary

Researchers from tilde-research have introduced Aurora, a new leverage-aware optimizer designed to overcome a critical limitation in the popular Muon optimizer: neuron death in MLP layers when training tall matrices. Muon's row-norm anisotropy causes significant portions of neurons to die permanently early in training, which row normalization (as in NorMuon) can fix but at the cost of orthogonality. Aurora formulates steepest descent under the joint constraints of row-norm uniformity and orthogonality, providing a principled solution that maintains both properties without sacrificing precision.

The optimizer has delivered impressive empirical results: a 1.1B parameter model trained with Aurora achieves 100x data efficiency on open-source internet data while outperforming larger models on standard benchmarks like HellaSwag. Aurora also achieves state-of-the-art performance on the modded-nanoGPT speedrun, a competitive optimization benchmark. With only 6% computational overhead over traditional Muon and requiring minimal tuning, Aurora functions as a practical drop-in replacement. The team has released both Riemannian and vanilla implementations as open-source code on GitHub, enabling immediate adoption across the research community.

  • The full implementation is open-sourced with both Riemannian and vanilla variants, lowering barriers to adoption for LLM training

Editorial Opinion

Aurora represents a meaningful refinement to optimizer design that addresses real pathologies in existing methods rather than chasing marginal improvements. The ability to recover 100x data efficiency gains while maintaining orthogonal gradient updates through a relatively elegant mathematical formulation suggests substantial untapped potential in foundational training algorithms. For practitioners training LLMs, the combination of strong empirical results, minimal overhead, and immediate open-source availability makes Aurora a compelling candidate for adoption in near-term model development.

Large Language Models (LLMs)Machine LearningDeep LearningOpen Source

Comments

Suggested

AnthropicAnthropic
OPEN SOURCE

Anthropic Releases Prempti: Open-Source Guardrails for AI Coding Agents

2026-05-12
vlm-runvlm-run
OPEN SOURCE

mm-ctx: Open-Source Multimodal CLI Toolkit Brings Vision Capabilities to AI Agents

2026-05-12
AnthropicAnthropic
PARTNERSHIP

SpaceX Backs Anthropic with Massive Data Centre Deal Amidst Musk's OpenAI Legal Battle

2026-05-12
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us