NVIDIA's Cosmos-Predict2.5 Achieves 1.4x Speedup on AMD MI300X GPUs, Challenging NVIDIA's Hardware Dominance
Key Takeaways
- ▸Cosmos-Predict2.5-2B achieves ~1.27–1.49x faster wall-clock performance on AMD MI300X versus NVIDIA H200 at equal quality, challenging NVIDIA's hardware performance narrative
- ▸This represents one of the first production-grade world model deployments on AMD GPUs, proving the viability of AMD hardware for serious AI simulation workloads outside the NVIDIA ecosystem
- ▸NVIDIA's open-source Cosmos model enables third-party optimization efforts, demonstrating the strategic value of openness in software even when it may benefit competing hardware vendors
Summary
NVIDIA has demonstrated that its Cosmos-Predict2.5-2B world model—a foundational model for large-scale generative simulation in physical AI—achieves approximately 1.4x faster inference on AMD's MI300X GPU compared to NVIDIA's own H200 (Hopper) hardware at equivalent output quality. The benchmark, conducted at 720p resolution with bf16 precision across 36 diffusion steps, represents one of the first production-grade deployments of a world model on AMD GPUs, signaling that AMD hardware is now competitive with or superior to NVIDIA for large diffusion-based model inference workloads.
The result underscores a significant shift in the GPU market: while NVIDIA has long dominated AI accelerator sales, this benchmark demonstrates that AMD's MI300X can deliver superior performance for specific inference tasks, at least when models are properly optimized for the hardware. NVIDIA's team is continuing to optimize further, with a second version in development targeting up to 50% additional runtime reduction. The company released Cosmos under the Apache 2.0 license, enabling broad adoption and third-party optimizations like this AMD port.
- Continued optimization efforts targeting 50% additional runtime reduction suggest the performance gap between AMD and NVIDIA hardware may widen further
Editorial Opinion
While NVIDIA's openness in releasing Cosmos under Apache 2.0 is commendable, this benchmark is a double-edged sword: it validates NVIDIA's foundation model architecture while simultaneously demonstrating that AMD's hardware can execute it more efficiently. For NVIDIA, maintaining software-level lock-in and ecosystem moats becomes increasingly critical as AMD hardware catches up. The benchmark also hints at broader implications—if AMD can outperform on large diffusion models, the narrative of NVIDIA's unassailable GPU dominance may be more fragile than the market currently prices in.


