MIT's CSAIL Releases MathNet: World's Largest Open Dataset of Olympiad-Level Math Problems
Key Takeaways
- ▸MathNet contains 30,000+ expert-authored Olympiad-level math problems across 143 competitions from 47 countries in 17 languages, making it 5x larger than existing datasets
- ▸The dataset draws exclusively from official competition booklets rather than community forums, ensuring solutions are peer-reviewed and comprehensive, with multiple approaches to problems
- ▸The resource addresses a historical gap where prestigious competition materials were previously scattered, inaccessible, or tied to specific countries, benefiting both AI research and global student preparation
Summary
Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), in collaboration with King Abdullah University of Science and Technology (KAUST) and HUMAIN, have released MathNet, the largest high-quality dataset of proof-based mathematics problems ever created. The dataset comprises over 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions across four decades—five times larger than the next biggest dataset of its kind. The work will be presented at the International Conference on Learning Representations (ICLR 2026) in Brazil.
What distinguishes MathNet is not only its scale but its unprecedented global scope and depth. Unlike previous Olympiad datasets that focused primarily on competitions in the United States and China, MathNet captures mathematical perspectives and problem-solving traditions from six continents. The dataset includes both text and image-based problems, with expert-written, peer-reviewed solutions that often present multiple approaches to each problem. Building the dataset required tracking down 1,595 PDF volumes totaling over 25,000 pages, with a significant portion sourced from Navid Safaei's personal archive of competition booklets collected since 2006.
The public release of MathNet serves dual purposes: it provides AI researchers with a rich resource for testing mathematical reasoning capabilities in AI models, and it offers students worldwide access to a centralized, searchable collection of high-quality competition mathematics problems and worked solutions from diverse global traditions.
- The dataset includes four decades of competition mathematics spanning six continents, capturing diverse mathematical perspectives and problem-solving traditions
Editorial Opinion
MathNet represents an important democratization of mathematical knowledge and a thoughtful approach to dataset creation that prioritizes quality and global diversity over mere scale. By capturing expert-curated problems from underrepresented mathematical traditions, the dataset has the potential to improve both AI reasoning capabilities and educational access for students without institutional support. This work demonstrates how systematic curation of existing materials can yield research-grade resources without requiring entirely new data generation.



