Google DeepMind Launches Gemma 4: Open-Source Multimodal Models with On-Device Capabilities
Key Takeaways
- ▸Gemma 4 is fully open-source with Apache 2.0 licensing and available across major ML frameworks and inference engines
- ▸Models achieve frontier-level performance with the 31B variant scoring 1452 on LMArena while the 26B MoE model achieves 1441 with only 4B active parameters
- ▸True multimodal support includes image, text, and audio inputs with optimized architectures for on-device deployment
Summary
Google DeepMind has released Gemma 4, a family of open-source multimodal models available on Hugging Face with Apache 2.0 licensing. The models support image, text, and audio inputs across four size variants (ranging from small to 31B parameters), with both base and instruction-tuned versions. Gemma 4 achieves competitive benchmarks, with the 31B dense model reaching an estimated LMArena score of 1452 and the 26B mixture-of-experts variant achieving 1441 with only 4B active parameters.
The models introduce several architectural innovations including Per-Layer Embeddings (PLE), alternating local and global attention patterns, dual RoPE configurations for extended context windows, and shared key-value caching. Key features enable deployment across diverse platforms including transformers, llama.cpp, MLX, WebGPU, and Rust, making the models suitable for on-device inference. Gemma 4 incorporates configurable image token inputs and variable aspect ratios to balance speed, memory, and quality, while smaller variants support audio alongside image and text inputs.
- Innovative architectural features like Per-Layer Embeddings, variable aspect ratio vision encoding, and shared KV caching enable efficient long-context and agentic use cases
Editorial Opinion
Gemma 4 represents a significant milestone in democratizing frontier-class multimodal AI capabilities. By combining truly open licensing with competitive benchmark performance and flexible deployment options—from local devices to cloud infrastructure—Google DeepMind is raising the bar for what open-source AI should look like. The focus on on-device capabilities and architectural efficiency suggests a thoughtful approach to practical AI deployment that respects both performance and privacy considerations.


