NVIDIA's Cosmos-Predict2.5 Achieves 1.4x Speedup on AMD MI300X GPUs, Challenging NVIDIA's Hardware Dominance

Key Takeaways

▸Cosmos-Predict2.5-2B achieves ~1.27–1.49x faster wall-clock performance on AMD MI300X versus NVIDIA H200 at equal quality, challenging NVIDIA's hardware performance narrative
▸This represents one of the first production-grade world model deployments on AMD GPUs, proving the viability of AMD hardware for serious AI simulation workloads outside the NVIDIA ecosystem
▸NVIDIA's open-source Cosmos model enables third-party optimization efforts, demonstrating the strategic value of openness in software even when it may benefit competing hardware vendors

Source:

Hacker Newshttps://moonmath.ai/posts/cosmos-amd/↗

Summary

NVIDIA has demonstrated that its Cosmos-Predict2.5-2B world model—a foundational model for large-scale generative simulation in physical AI—achieves approximately 1.4x faster inference on AMD's MI300X GPU compared to NVIDIA's own H200 (Hopper) hardware at equivalent output quality. The benchmark, conducted at 720p resolution with bf16 precision across 36 diffusion steps, represents one of the first production-grade deployments of a world model on AMD GPUs, signaling that AMD hardware is now competitive with or superior to NVIDIA for large diffusion-based model inference workloads.

The result underscores a significant shift in the GPU market: while NVIDIA has long dominated AI accelerator sales, this benchmark demonstrates that AMD's MI300X can deliver superior performance for specific inference tasks, at least when models are properly optimized for the hardware. NVIDIA's team is continuing to optimize further, with a second version in development targeting up to 50% additional runtime reduction. The company released Cosmos under the Apache 2.0 license, enabling broad adoption and third-party optimizations like this AMD port.

Continued optimization efforts targeting 50% additional runtime reduction suggest the performance gap between AMD and NVIDIA hardware may widen further

Editorial Opinion

While NVIDIA's openness in releasing Cosmos under Apache 2.0 is commendable, this benchmark is a double-edged sword: it validates NVIDIA's foundation model architecture while simultaneously demonstrating that AMD's hardware can execute it more efficiently. For NVIDIA, maintaining software-level lock-in and ecosystem moats becomes increasingly critical as AMD hardware catches up. The benchmark also hints at broader implications—if AMD can outperform on large diffusion models, the narrative of NVIDIA's unassailable GPU dominance may be more fragile than the market currently prices in.

NVIDIA's Cosmos-Predict2.5 Achieves 1.4x Speedup on AMD MI300X GPUs, Challenging NVIDIA's Hardware Dominance

Key Takeaways

▸Cosmos-Predict2.5-2B achieves ~1.27–1.49x faster wall-clock performance on AMD MI300X versus NVIDIA H200 at equal quality, challenging NVIDIA's hardware performance narrative
▸This represents one of the first production-grade world model deployments on AMD GPUs, proving the viability of AMD hardware for serious AI simulation workloads outside the NVIDIA ecosystem
▸NVIDIA's open-source Cosmos model enables third-party optimization efforts, demonstrating the strategic value of openness in software even when it may benefit competing hardware vendors

Summary

Continued optimization efforts targeting 50% additional runtime reduction suggest the performance gap between AMD and NVIDIA hardware may widen further

Editorial Opinion

While NVIDIA's openness in releasing Cosmos under Apache 2.0 is commendable, this benchmark is a double-edged sword: it validates NVIDIA's foundation model architecture while simultaneously demonstrating that AMD's hardware can execute it more efficiently. For NVIDIA, maintaining software-level lock-in and ecosystem moats becomes increasingly critical as AMD hardware catches up. The benchmark also hints at broader implications—if AMD can outperform on large diffusion models, the narrative of NVIDIA's unassailable GPU dominance may be more fragile than the market currently prices in.

NVIDIA's Cosmos-Predict2.5 Achieves 1.4x Speedup on AMD MI300X GPUs, Challenging NVIDIA's Hardware Dominance

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA's Vera Rubin GPU Rack BOM Reaches $7.8M: Memory Costs Surge 435%, Raising Questions About Pricing Sustainability

Nvidia Targets $20B CPU Revenue, Positions Vera Chips for Market Dominance

NVIDIA Open-Sources NVCF: Full GPU Function Platform Now Available

Comments

Suggested

Anthropic Launches Vulnerability Disclosure Dashboard, Reveals 1,596 Vulnerabilities Found by Claude Mythos Preview

WiseTech Global Staff Uncover Edited Communications as 2,000-Job Redundancy Becomes Increasingly Contentious

Meta Lays Off 8,000 as AI Pivot Triggers Internal Backlash

NVIDIA's Cosmos-Predict2.5 Achieves 1.4x Speedup on AMD MI300X GPUs, Challenging NVIDIA's Hardware Dominance

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA's Vera Rubin GPU Rack BOM Reaches $7.8M: Memory Costs Surge 435%, Raising Questions About Pricing Sustainability

Nvidia Targets $20B CPU Revenue, Positions Vera Chips for Market Dominance

NVIDIA Open-Sources NVCF: Full GPU Function Platform Now Available

Comments

Suggested

Anthropic Launches Vulnerability Disclosure Dashboard, Reveals 1,596 Vulnerabilities Found by Claude Mythos Preview

WiseTech Global Staff Uncover Edited Communications as 2,000-Job Redundancy Becomes Increasingly Contentious

Meta Lays Off 8,000 as AI Pivot Triggers Internal Backlash