Apple Unveils Third Generation Foundation Models, Powering Advanced On-Device and Cloud AI
Key Takeaways
- ▸Apple Foundation Models 3 spans a family of five models: two on-device models (3B and 20B parameters with sparse activation) and three cloud models optimized for different compute tiers and use cases
- ▸AFM 3 Core Advanced uses a novel sparse architecture (Instruction-Following Pruning) that overcomes DRAM constraints by storing weights in flash and dynamically loading expert parameters, enabling on-device capabilities previously only possible in cloud systems
- ▸All models are natively multimodal and built specifically for Apple silicon, enabling new features in expressive voices, image generation/editing, complex reasoning, and agentic tool use
Summary
Apple announced its third generation of Apple Foundation Models (AFM 3), a family of five custom-built foundation models developed in collaboration with Google. The lineup includes two on-device models—AFM 3 Core (3B parameters) and AFM 3 Core Advanced (20B parameters with sparse activation)—plus three server-based models running on Apple's Private Cloud Compute infrastructure, including a new collaboration with NVIDIA for GPU support on Google Cloud. AFM 3 Core Advanced introduces a novel sparsely activated architecture using Instruction-Following Pruning (IFP), a breakthrough that stores the full model in flash memory while selectively loading experts into DRAM, enabling powerful AI capabilities without exceeding device hardware constraints. All models are natively multimodal and deeply integrated into Apple's operating systems to power the next generation of Apple Intelligence, including an enhanced Siri, intelligent image generation and editing, advanced reasoning, and complex tool use—all while maintaining Apple's privacy-first guarantee that user data is never stored or shared.
- Strategic partnerships with Google (model development) and NVIDIA (GPU infrastructure) extend Apple's capabilities while the Private Cloud Compute layer maintains privacy guarantees
- The architecture represents a significant shift toward efficient on-device inference, bringing enterprise-class AI performance to consumer devices


