Cohere Releases Cohere-Transcribe: Open-Source 2B Speech Recognition Model Achieving #1 Performance on ASR Leaderboard
Key Takeaways
- ▸Cohere releases open-source 2B-parameter speech recognition model under Apache 2.0 license on Hugging Face
- ▸Achieves #1 ranking on Hugging Face Open ASR Leaderboard for English, outperforming proprietary competitors
- ▸Supports 14 enterprise-critical languages with state-of-the-art accuracy across all languages
Summary
Cohere has open-sourced cohere-transcribe-03-2026, a 2B-parameter speech recognition model available under Apache 2.0 license on Hugging Face. Trained from scratch on 0.5M hours of curated audio-transcript pairs, the model delivers state-of-the-art accuracy while maintaining exceptional efficiency, achieving offline throughput three times higher than similarly-sized competitor models.
The model's performance is impressive: it claims the #1 position on the Hugging Face Open ASR Leaderboard for English, outperforming both proprietary and open-source alternatives. Across 14 supported languages (including English, German, French, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Arabic, Vietnamese, Mandarin Chinese, Japanese, and Korean), cohere-transcribe matches or exceeds all existing open-source models.
Architecturally, the model uses a 2B encoder-decoder transformer with a Fast-Conformer encoder, dedicating over 90% of parameters to the encoder and maintaining a lightweight decoder. This design minimizes autoregressive inference compute, enabling dramatically faster serving compared to models built on pre-trained text LLMs. Cohere partnered with vLLM to enable production-grade serving through an open-source stack, emphasizing deployment readiness alongside benchmark performance.
The release represents Cohere's first venture into audio AI and signals the company's diversification beyond large language models. A Hugging Face Space enables easy testing, and the open-source availability democratizes access to high-quality speech recognition technology for developers and researchers.
- Delivers 3x higher offline throughput than similarly-sized models through encoder-heavy architecture
- Production-ready with vLLM integration for efficient enterprise deployment and inference
Editorial Opinion
Cohere's open-source speech recognition release marks a strategic expansion beyond language models with a highly efficient, competitive offering. By achieving top-tier benchmark performance while maintaining production efficiency and multilingual support, Cohere demonstrates that specialized, well-engineered models can outcompete larger alternatives—a significant statement in an industry dominated by proprietary ASR services. The focus on practical deployment (vLLM integration) rather than academic metrics alone shows Cohere's commitment to building real-world deployable AI tools. This release will likely accelerate adoption of open-source transcription alternatives and provide enterprises with a high-quality, cost-effective option.



