BotBeat
...
← Back

> ▌

MetaMeta
PRODUCT LAUNCHMeta2026-05-18

Meta Launches MLX Delegate for ExecuTorch: GPU-Accelerated PyTorch on Apple Silicon

Key Takeaways

  • ▸3-6x performance improvement: MLX Delegate delivers significantly higher throughput for generative AI workloads compared to existing ExecuTorch backends on macOS
  • ▸PyTorch 2 native integration: Directly leverages torch.export and TorchAO quantization tools, enabling automatic support for new models and techniques as they land in PyTorch
  • ▸Flexible quantization and cross-platform portability: Supports multiple precision options that work across multiple ExecuTorch backends, enabling single-model deployment across different hardware platforms
Source:
Hacker Newshttps://pytorch.org/blog/running-pytorch-models-on-apple-silicon-gpus-with-the-executorch-mlx-delegate/↗

Summary

Meta has released the MLX Delegate, a new backend for ExecuTorch that enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs. The delegate seamlessly integrates with PyTorch 2's export stack and leverages Apple's MLX framework to deliver optimized Metal GPU kernels, achieving 3-6x higher throughput compared to existing ExecuTorch backends on macOS.

The MLX Delegate supports a comprehensive range of operations essential for transformer inference, including quantized matrix multiplication, multi-head attention, rotary position embeddings, and mixture-of-experts routing. It provides multiple precision and quantization options—BF16, FP16, FP32, 2/4/8-bit affine quantization, and NVIDIA's NVFP4—allowing developers to optimize for both performance and model size on resource-constrained Apple Silicon devices.

The delegate has been validated across diverse model architectures including dense transformers like Llama, Qwen, and Gemma, sparse Mixture-of-Experts models, and speech-to-text systems such as Whisper and Voxtral. By plugging directly into the PyTorch 2 export ecosystem, the MLX Delegate enables developers to target multiple backends with a single quantized model, and provides a portable runtime API that works across MLX, XNNPACK, CoreML, Vulkan, and CUDA without requiring application-level changes. The delegate is currently experimental and under active development.

Editorial Opinion

This is a strategic move that strengthens PyTorch's ecosystem for on-device AI inference on Apple Silicon, a critical growth area as developers increasingly seek to run powerful models locally. By tightly integrating with PyTorch 2's export infrastructure rather than creating a standalone tool, Meta has positioned the MLX Delegate to automatically benefit from future PyTorch advancements. The 3-6x performance gains are significant enough to make Apple Silicon a viable platform for production inference workloads. While the experimental status warrants cautious adoption initially, this demonstrates Meta's commitment to supporting diverse hardware platforms through ExecuTorch.

Machine LearningMLOps & InfrastructureAI HardwareProduct Launch

More from Meta

MetaMeta
FUNDING & BUSINESS

Meta Begins Laying Off Thousands of Employees as It Transforms Around AI

2026-05-20
MetaMeta
UPDATE

Meta Introduces MLX Delegate for GPU-Accelerated PyTorch Inference on Apple Silicon

2026-05-20
MetaMeta
RESEARCH

The Hidden Costs of Scale: Why Advanced LLM Training Remains Precarious

2026-05-19

Comments

Suggested

AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
Research CommunityResearch Community
RESEARCH

New Methodology Proposed for Selecting Runtime Architecture Patterns in Production LLM Agents

2026-05-20
NVIDIANVIDIA
FUNDING & BUSINESS

NVIDIA Reports Record $81.6B Revenue in Q1 FY2027, Data Center Segment Surges 92% YoY

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us