Cohere Open-Sources Command A+, a 218B MoE Model for Enterprise Agents
Key Takeaways
- ▸Single unified model consolidates five separate Command A models with multimodal, tool use, and reasoning capabilities built in
- ▸Runs on two NVIDIA H100s at W4A4 quantization or a single Blackwell GPU, enabling practical private enterprise deployment
- ▸Significant performance gains: 85% on agentic task completion benchmark (vs. 37% predecessor), 20% better agentic QA, 32% better spreadsheet analysis, 54% improvement in multi-session memory
Summary
Cohere has open-sourced Command A+, a 218-billion-parameter mixture-of-experts model available today on Hugging Face under the Apache 2.0 license. The model represents a consolidation of Cohere's fragmented Command A family—which previously included separate models for general use, reasoning, vision, translation, and tool use—into a single unified system with 25 billion active parameters at inference time. Built from a year of deploying Cohere's North enterprise AI workspace with real customers, Command A+ is optimized for agentic workflows, multimodal reasoning, and private deployment.
Command A+ runs efficiently on just two NVIDIA H100 GPUs at W4A4 quantization or a single Blackwell GPU, making it practical for enterprise teams managing private deployments without routing sensitive data through external APIs. The consolidation into a single model significantly simplifies infrastructure management while delivering substantial performance improvements: agentic question-answering accuracy improved 20% over the previous Command A Reasoning model, spreadsheet analysis quality jumped 32%, and multi-session memory performance increased from 39% to 54%. On τ²-Bench Telecom, a benchmark testing multi-step agentic task completion in realistic enterprise scenarios, Command A+ scores 85% compared to 37% for its predecessor.
The model features 48-language support (up from 23), with improved tokenization for non-European languages including 20% better compression for Arabic, 16% for Korean, and 18% for Japanese. Inference speed improved up to 63% higher output tokens per second and reduced time-to-first-token by 17% compared to Command A Reasoning, with W4A4 quantization adding another 47% speed increase via speculative decoding optimized for the MoE architecture.
- Up to 63% faster inference with speculative decoding optimized for MoE architecture; improved tokenization reduces inference costs especially for non-European languages
- Available on Hugging Face under Apache 2.0 license with 48-language support



