BotBeat
...
← Back

> ▌

MetaMeta
PRODUCT LAUNCHMeta2026-05-18

Meta Launches MLX Delegate for ExecuTorch: GPU-Accelerated PyTorch on Apple Silicon

Key Takeaways

  • ▸3-6x performance improvement: MLX Delegate delivers significantly higher throughput for generative AI workloads compared to existing ExecuTorch backends on macOS
  • ▸PyTorch 2 native integration: Directly leverages torch.export and TorchAO quantization tools, enabling automatic support for new models and techniques as they land in PyTorch
  • ▸Flexible quantization and cross-platform portability: Supports multiple precision options that work across multiple ExecuTorch backends, enabling single-model deployment across different hardware platforms
Source:
Hacker Newshttps://pytorch.org/blog/running-pytorch-models-on-apple-silicon-gpus-with-the-executorch-mlx-delegate/↗

Summary

Meta has released the MLX Delegate, a new backend for ExecuTorch that enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs. The delegate seamlessly integrates with PyTorch 2's export stack and leverages Apple's MLX framework to deliver optimized Metal GPU kernels, achieving 3-6x higher throughput compared to existing ExecuTorch backends on macOS.

The MLX Delegate supports a comprehensive range of operations essential for transformer inference, including quantized matrix multiplication, multi-head attention, rotary position embeddings, and mixture-of-experts routing. It provides multiple precision and quantization options—BF16, FP16, FP32, 2/4/8-bit affine quantization, and NVIDIA's NVFP4—allowing developers to optimize for both performance and model size on resource-constrained Apple Silicon devices.

The delegate has been validated across diverse model architectures including dense transformers like Llama, Qwen, and Gemma, sparse Mixture-of-Experts models, and speech-to-text systems such as Whisper and Voxtral. By plugging directly into the PyTorch 2 export ecosystem, the MLX Delegate enables developers to target multiple backends with a single quantized model, and provides a portable runtime API that works across MLX, XNNPACK, CoreML, Vulkan, and CUDA without requiring application-level changes. The delegate is currently experimental and under active development.

Editorial Opinion

This is a strategic move that strengthens PyTorch's ecosystem for on-device AI inference on Apple Silicon, a critical growth area as developers increasingly seek to run powerful models locally. By tightly integrating with PyTorch 2's export infrastructure rather than creating a standalone tool, Meta has positioned the MLX Delegate to automatically benefit from future PyTorch advancements. The 3-6x performance gains are significant enough to make Apple Silicon a viable platform for production inference workloads. While the experimental status warrants cautious adoption initially, this demonstrates Meta's commitment to supporting diverse hardware platforms through ExecuTorch.

Machine LearningMLOps & InfrastructureAI HardwareProduct Launch

More from Meta

MetaMeta
UPDATE

Meta Acknowledges AI Agent Development Slower Than Expected, Despite $145B Infrastructure Investment

2026-07-04
MetaMeta
PRODUCT LAUNCH

Meta AI Chief Claims New LLM Model Has Caught Up with OpenAI's Flagship

2026-07-03
MetaMeta
RESEARCH

Explaining Attention Mechanisms in Transformers Through Program Synthesis

2026-07-03

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

2026-07-04
LLM Agent EcosystemLLM Agent Ecosystem
RESEARCH

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

2026-07-04
MetaMeta
UPDATE

Meta Acknowledges AI Agent Development Slower Than Expected, Despite $145B Infrastructure Investment

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us