BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCHGoogle / Alphabet2026-04-02

Google DeepMind's Gemma 4 Launches with Day-One Support on NVIDIA and AMD via Modular's MAX Platform

Key Takeaways

  • ▸Gemma 4 is now available on day zero with optimized support for both NVIDIA B200 and AMD MI355X hardware through Modular's MAX framework
  • ▸Modular achieves 15% higher throughput than vLLM on NVIDIA B200 with zero accuracy degradation, demonstrating superior inference optimization
  • ▸Gemma 4 models support multimodal inputs (text, images, video) and long context windows (256K tokens), enabling advanced applications like OCR and video understanding
Source:
Hacker Newshttps://www.modular.com/blog/day-zero-launch-fastest-performance-for-gemma-4-on-nvidia-and-amd↗

Summary

Google DeepMind has released Gemma 4, a family of state-of-the-art open-source multimodal models, with immediate availability on both NVIDIA and AMD hardware through Modular's MAX inference framework. The release includes Gemma 4 31B, a 31-billion-parameter dense model with a 256K context window, and Gemma 4 26B A4B, a Mixture-of-Experts variant with 4B activated parameters per forward pass, both supporting text, images, and video inputs with dynamic resolution capabilities.

Modular's MAX platform delivers exceptional performance gains, achieving 15% higher throughput compared to vLLM on NVIDIA's B200 GPU, while being the only stack currently capable of running Gemma 4 on both Blackwell and AMD MI355X processors. The hardware-agnostic optimization allows developers to seamlessly transition from testing to production without changing infrastructure, with the same engine powering both playground experiments and production endpoints.

The partnership demonstrates a significant shift toward hardware-agnostic AI inference, enabling enterprises and developers to choose between NVIDIA and AMD based on cost and performance requirements. Modular Cloud offers a free tier for initial experimentation, with production-ready endpoints for demanding tasks including OCR, video understanding, and long-context reasoning workflows.

  • Modular's hardware-agnostic approach eliminates vendor lock-in, allowing teams to switch between NVIDIA and AMD GPUs based on workload and cost requirements

Editorial Opinion

Gemma 4's launch with true hardware-agnostic performance represents a meaningful step toward vendor flexibility in the AI infrastructure space, challenging the dominance of NVIDIA-centric optimization workflows. By delivering superior performance on multiple GPU architectures simultaneously, Modular raises the bar for inference optimization and signals that the era of single-vendor dependency may be shifting. However, the real test will be whether this performance parity holds as models scale and whether enterprises actually adopt AMD alternatives or view the option primarily as a negotiating lever with NVIDIA.

Large Language Models (LLMs)Generative AIMultimodal AIMLOps & InfrastructureAI Hardware

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
Google / AlphabetGoogle / Alphabet
INDUSTRY REPORT

Kaggle Hosts 37,000 AI-Generated Podcasts, Raising Questions About Content Authenticity

2026-04-04
Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google Releases Gemma 4 with Client-Side WebGPU Support for On-Device Inference

2026-04-04

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us