BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCHGoogle / Alphabet2026-04-06

Google DeepMind Launches Gemma 4: Open-Source Multimodal Models with On-Device Capabilities

Key Takeaways

  • ▸Gemma 4 is fully open-source with Apache 2.0 licensing and available across major ML frameworks and inference engines
  • ▸Models achieve frontier-level performance with the 31B variant scoring 1452 on LMArena while the 26B MoE model achieves 1441 with only 4B active parameters
  • ▸True multimodal support includes image, text, and audio inputs with optimized architectures for on-device deployment
Source:
Hacker Newshttps://huggingface.co/blog/gemma4↗

Summary

Google DeepMind has released Gemma 4, a family of open-source multimodal models available on Hugging Face with Apache 2.0 licensing. The models support image, text, and audio inputs across four size variants (ranging from small to 31B parameters), with both base and instruction-tuned versions. Gemma 4 achieves competitive benchmarks, with the 31B dense model reaching an estimated LMArena score of 1452 and the 26B mixture-of-experts variant achieving 1441 with only 4B active parameters.

The models introduce several architectural innovations including Per-Layer Embeddings (PLE), alternating local and global attention patterns, dual RoPE configurations for extended context windows, and shared key-value caching. Key features enable deployment across diverse platforms including transformers, llama.cpp, MLX, WebGPU, and Rust, making the models suitable for on-device inference. Gemma 4 incorporates configurable image token inputs and variable aspect ratios to balance speed, memory, and quality, while smaller variants support audio alongside image and text inputs.

  • Innovative architectural features like Per-Layer Embeddings, variable aspect ratio vision encoding, and shared KV caching enable efficient long-context and agentic use cases

Editorial Opinion

Gemma 4 represents a significant milestone in democratizing frontier-class multimodal AI capabilities. By combining truly open licensing with competitive benchmark performance and flexible deployment options—from local devices to cloud infrastructure—Google DeepMind is raising the bar for what open-source AI should look like. The focus on on-device capabilities and architectural efficiency suggests a thoughtful approach to practical AI deployment that respects both performance and privacy considerations.

Large Language Models (LLMs)Generative AIMultimodal AIMLOps & InfrastructureOpen Source

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
INDUSTRY REPORT

Google's Compute Crunch Drives Top AI Researchers to Launch Startups

2026-05-21
Google / AlphabetGoogle / Alphabet
INDUSTRY REPORT

Gemini's Production Disaster Exposes Dangers of 'Vibe Coding' as AI Agents Go Rogue

2026-05-21
Google / AlphabetGoogle / Alphabet
RESEARCH

Google's Gemini AI Unexpectedly Exposed System Prompt, Revealing Hidden Instructions

2026-05-21

Comments

Suggested

ByteDanceByteDance
OPEN SOURCE

ByteDance Open-Sources Lance: A Unified 3B Multimodal Model for Image, Video, and Editing

2026-05-21
AnthropicAnthropic
RESEARCH

Anthropic's Cheaper Haiku Model Outperforms Sonnet in Agent Task Benchmark

2026-05-21
NVIDIANVIDIA
FUNDING & BUSINESS

Nvidia Crushes Q1 2026 Earnings as AI Infrastructure Boom Accelerates

2026-05-21
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us