MiniMax M3 Closes the Frontier Gap: Chinese Open-Weights Model Challenges GPT-4.5 and Claude Opus
Key Takeaways
- ▸MiniMax M3 is the first open-weights model to genuinely compete with frontier models like GPT-4.5 and Claude Opus across multiple task categories
- ▸MiniMax Sparse Attention enables practical use of the 1M context window with claimed 9–15x speedup improvements while maintaining quality
- ▸Aggressive pricing ($0.60/$2.40 per million tokens with launch discount) positions M3 at 1/10th to 1/20th the cost of frontier models
Summary
MiniMax launched M3, an open-weights multimodal large language model on June 1, 2026, that marks a significant step forward for non-frontier models. The 1-million-token-context model introduces MiniMax Sparse Attention (MSA), a mechanism that claims to deliver a 9x speedup on prefill and 15x on decode while maintaining quality—making the massive context window practically usable, not just theoretically available. Available immediately via the MiniMax API, OpenRouter, and launch partners at $0.60/$2.40 per million tokens (with a 50% launch discount), M3 is priced a tenth to a twentieth of closed frontier models.
Tested against standard benchmarks including website design, code generation, and domain-specific tasks, M3 delivered results comparable to GPT-4.5 and Anthropic's Claude Opus according to independent reviewer gkmcd. The model natively handles text, image, and video inputs, with text output. However, MiniMax has not yet disclosed the model's parameter count, and while the model is called "open-weights," the weights were not publicly available at launch—though MiniMax promised release to Hugging Face within 10 days of announcement.
- Model is immediately available via MiniMax API and OpenRouter, though full open-source release is still pending
Editorial Opinion
M3 represents a inflection point: the open-weights model tier is finally catching up to closed frontier models not just in raw capability but in practical usability. The sparse attention breakthrough makes the difference between a million-token context being a spec-sheet gimmick and an actual architectural advantage. If the performance claims hold under real load and the weights do land on Hugging Face on schedule, this could accelerate the timeline for commodity LLM applications moving away from closed APIs.



