BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCHGoogle / Alphabet2026-04-02

Google DeepMind's Gemma 4 Launches with Day-One Support on NVIDIA and AMD via Modular's MAX Platform

Key Takeaways

  • ▸Gemma 4 is now available on day zero with optimized support for both NVIDIA B200 and AMD MI355X hardware through Modular's MAX framework
  • ▸Modular achieves 15% higher throughput than vLLM on NVIDIA B200 with zero accuracy degradation, demonstrating superior inference optimization
  • ▸Gemma 4 models support multimodal inputs (text, images, video) and long context windows (256K tokens), enabling advanced applications like OCR and video understanding
Source:
Hacker Newshttps://www.modular.com/blog/day-zero-launch-fastest-performance-for-gemma-4-on-nvidia-and-amd↗

Summary

Google DeepMind has released Gemma 4, a family of state-of-the-art open-source multimodal models, with immediate availability on both NVIDIA and AMD hardware through Modular's MAX inference framework. The release includes Gemma 4 31B, a 31-billion-parameter dense model with a 256K context window, and Gemma 4 26B A4B, a Mixture-of-Experts variant with 4B activated parameters per forward pass, both supporting text, images, and video inputs with dynamic resolution capabilities.

Modular's MAX platform delivers exceptional performance gains, achieving 15% higher throughput compared to vLLM on NVIDIA's B200 GPU, while being the only stack currently capable of running Gemma 4 on both Blackwell and AMD MI355X processors. The hardware-agnostic optimization allows developers to seamlessly transition from testing to production without changing infrastructure, with the same engine powering both playground experiments and production endpoints.

The partnership demonstrates a significant shift toward hardware-agnostic AI inference, enabling enterprises and developers to choose between NVIDIA and AMD based on cost and performance requirements. Modular Cloud offers a free tier for initial experimentation, with production-ready endpoints for demanding tasks including OCR, video understanding, and long-context reasoning workflows.

  • Modular's hardware-agnostic approach eliminates vendor lock-in, allowing teams to switch between NVIDIA and AMD GPUs based on workload and cost requirements

Editorial Opinion

Gemma 4's launch with true hardware-agnostic performance represents a meaningful step toward vendor flexibility in the AI infrastructure space, challenging the dominance of NVIDIA-centric optimization workflows. By delivering superior performance on multiple GPU architectures simultaneously, Modular raises the bar for inference optimization and signals that the era of single-vendor dependency may be shifting. However, the real test will be whether this performance parity holds as models scale and whether enterprises actually adopt AMD alternatives or view the option primarily as a negotiating lever with NVIDIA.

Large Language Models (LLMs)Generative AIMultimodal AIMLOps & InfrastructureAI Hardware

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Google / AlphabetGoogle / Alphabet
PARTNERSHIP

Singapore Inks AI Deals with Google

2026-05-20
Google / AlphabetGoogle / Alphabet
UPDATE

Google Overhauls Workspace App Icons with Gradient Design to Emphasize AI Integration

2026-05-20

Comments

Suggested

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
AnthropicAnthropic
POLICY & REGULATION

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us