Google DeepMind's Gemma 4 Launches with Day-One Support on NVIDIA and AMD via Modular's MAX Platform
Key Takeaways
- ▸Gemma 4 is now available on day zero with optimized support for both NVIDIA B200 and AMD MI355X hardware through Modular's MAX framework
- ▸Modular achieves 15% higher throughput than vLLM on NVIDIA B200 with zero accuracy degradation, demonstrating superior inference optimization
- ▸Gemma 4 models support multimodal inputs (text, images, video) and long context windows (256K tokens), enabling advanced applications like OCR and video understanding
Summary
Google DeepMind has released Gemma 4, a family of state-of-the-art open-source multimodal models, with immediate availability on both NVIDIA and AMD hardware through Modular's MAX inference framework. The release includes Gemma 4 31B, a 31-billion-parameter dense model with a 256K context window, and Gemma 4 26B A4B, a Mixture-of-Experts variant with 4B activated parameters per forward pass, both supporting text, images, and video inputs with dynamic resolution capabilities.
Modular's MAX platform delivers exceptional performance gains, achieving 15% higher throughput compared to vLLM on NVIDIA's B200 GPU, while being the only stack currently capable of running Gemma 4 on both Blackwell and AMD MI355X processors. The hardware-agnostic optimization allows developers to seamlessly transition from testing to production without changing infrastructure, with the same engine powering both playground experiments and production endpoints.
The partnership demonstrates a significant shift toward hardware-agnostic AI inference, enabling enterprises and developers to choose between NVIDIA and AMD based on cost and performance requirements. Modular Cloud offers a free tier for initial experimentation, with production-ready endpoints for demanding tasks including OCR, video understanding, and long-context reasoning workflows.
- Modular's hardware-agnostic approach eliminates vendor lock-in, allowing teams to switch between NVIDIA and AMD GPUs based on workload and cost requirements
Editorial Opinion
Gemma 4's launch with true hardware-agnostic performance represents a meaningful step toward vendor flexibility in the AI infrastructure space, challenging the dominance of NVIDIA-centric optimization workflows. By delivering superior performance on multiple GPU architectures simultaneously, Modular raises the bar for inference optimization and signals that the era of single-vendor dependency may be shifting. However, the real test will be whether this performance parity holds as models scale and whether enterprises actually adopt AMD alternatives or view the option primarily as a negotiating lever with NVIDIA.


