JetBrains Releases Mellum2: Efficient 12B Mixture-of-Experts Model for Production AI Systems
Key Takeaways
- ▸Mellum2 activates only 2.5B of its 12B parameters per token, delivering 2x faster inference than comparable models while maintaining competitive performance
- ▸Designed as a specialized 'focal' model for routing, RAG, summarization, and agent subtasks within larger AI systems rather than as a general-purpose replacement
- ▸Open-source release (Apache 2.0) with weights on Hugging Face enables private, self-hosted deployment for organizations handling sensitive code and data
Summary
JetBrains has released Mellum2, a 12-billion-parameter Mixture-of-Experts model trained on natural language and code, optimized for efficient, low-latency inference in production AI systems. The model activates only 2.5B parameters per token, enabling more than 2x faster inference compared to similarly-sized models while maintaining competitive benchmark performance across code generation, reasoning, science, and math tasks.
Mellum2 is designed as a "focal" model—a fast, specialized component optimized for high-frequency operations within larger AI systems rather than as a general-purpose replacement. The company positions it for use cases like routing and orchestration, retrieval-augmented generation (RAG), summarization, agent subtasks, and code-aware features. JetBrains emphasizes that modern production AI systems increasingly rely on multiple specialized models, and Mellum2 targets latency-sensitive operations that don't require frontier-scale models.
Released under the Apache 2.0 license with weights available on Hugging Face, Mellum2 can be deployed in self-hosted environments, making it suitable for organizations with proprietary code or privacy requirements. The release includes a full technical report detailing architecture, training methodology, and comprehensive benchmarks, underscoring JetBrains' commitment to open-source AI infrastructure.
- Specialized for text and code workloads, reflecting JetBrains' focus on software engineering use cases
Editorial Opinion
Mellum2 represents a thoughtful departure from the race toward larger, more general-purpose models. By releasing an efficient, specialized model optimized for specific high-frequency tasks, JetBrains acknowledges a practical reality: production AI systems don't need a frontier model doing every job. This 'focal model' approach—pairing fast, task-specific models with larger reasoning models—is likely to become increasingly valuable as organizations seek to balance cost, latency, and capability. JetBrains' open-source licensing also lowers barriers to adoption for teams building internal AI infrastructure.



