Google's Gemma 4 Gets Up to 3x Faster With Multi-Token Prediction
Key Takeaways
- ▸Multi-Token Prediction uses speculative decoding to predict future tokens with lightweight drafter models, achieving up to 3x faster inference on consumer hardware
- ▸The technique addresses memory bandwidth bottlenecks in local AI by generating speculative tokens during compute idle time while the main model processes context
- ▸Testing shows 2.8-3.1x speedups on mobile (Pixel phones) and 2.5x on Apple M4 chips with zero quality degradation
Summary
Google has released Multi-Token Prediction (MTP) drafters for its open-source Gemma 4 models, using speculative decoding to dramatically accelerate local AI inference. The new experimental feature allows smaller "drafter" models to predict multiple future tokens in parallel while the main model verifies them, effectively producing multiple tokens in the time it previously took to generate just one.
The MTP technology targets a fundamental bottleneck in local AI: memory bandwidth constraints on consumer hardware. Since most personal devices and mobile phones have slower memory than enterprise AI accelerators, processors waste computing cycles waiting to load model parameters. By leveraging idle compute time to speculatively generate tokens with a lightweight drafter (as small as 74 million parameters), Google's approach maintains full quality while dramatically improving speed.
In testing, Google reports speed improvements of 2.8x to 3.1x on Gemma 4's smaller E2B and E4B models running on Pixel phones, and a 2.5x boost for the 31B model on Apple's M4 silicon. The company emphasizes that MTP produces "zero quality degradation" since the primary model verifies all draft tokens before output. Combined with Gemma 4's newly permissive Apache 2.0 license, this update makes powerful open-source AI significantly more practical for edge and local deployment.
- Gemma 4's Apache 2.0 license and improved performance make local AI more accessible for privacy-conscious users and resource-constrained devices
- MTP drafters are available now for Gemma 4, with the largest models (26B MoE and 31B Dense) now more feasible for consumer hardware


