BotBeat
...
← Back

> ▌

Open Source / Community ProjectOpen Source / Community Project
OPEN SOURCEOpen Source / Community Project2026-03-20

Rustane: First Open-Source Rust Training Engine for Apple Neural Engine Enables On-Device LLM Training Up to 5B Parameters

Key Takeaways

  • ▸Rustane is the first open-source, training-capable Rust engine for Apple Neural Engine, using reverse-engineered private APIs to directly compile and execute transformer models
  • ▸Training validation confirmed across 48M to 5B parameter models with forward-pass capability to 30B on M4 Max 128GB, consuming only 3-5W of power
  • ▸Key architectural findings: efficiency cliff at dim=5120, architecture crossover at 3B (wide+shallow vs. deep+narrow), and RAM rather than chip specifications as the primary scaling limitation
Source:
Hacker Newshttps://github.com/ncdrone/rustane↗

Summary

A new open-source Rust-native engine called Rustane enables full training and inference of transformer models directly on Apple's Neural Engine (ANE) and Metal GPU, marking the first publicly available training-capable implementation for Apple's specialized hardware. The engine uses reverse-engineered private ANE APIs to compile and execute MIL kernels with remarkable efficiency, achieving training at just 3-5W power draw while leaving the GPU free for other tasks. Developers have validated the training pipeline across models ranging from 48M to 5B parameters on an M4 Max with 128GB RAM, with forward-pass capabilities confirmed up to 30B parameters, demonstrating that RAM—not the chip itself—is the primary limitation.

Rustane's architecture leverages memory-safe Rust to directly interface with Apple's undocumented _ANEClient and _ANEInMemoryModel APIs, bypassing CoreML's black-box scheduler. The project includes comprehensive benchmarking tools showing a 600M parameter model trains at ~535 tokens/second on ANE, with detailed results revealing architectural insights such as an efficiency cliff at dimension 5120 and a crossover point at 3B parameters where deeper, narrower architectures outperform wider, shallower ones. Trained weights export via SafeTensors for deployment anywhere, and the community is actively collecting performance data across different Apple Silicon chips to map the hardware's actual capabilities.

  • Trained weights export via SafeTensors and community benchmarking provides hardware-specific performance insights across Apple Silicon variants

Editorial Opinion

Rustane represents a significant democratization of on-device AI training on Apple Silicon, bypassing proprietary frameworks to unlock the Neural Engine's full potential. The reverse-engineered approach, while technically impressive, raises interesting questions about API accessibility and whether Apple might formalize these capabilities. The meticulous benchmarking and architectural insights provided by the project will likely influence future LLM design choices for Apple Silicon optimization.

Machine LearningDeep LearningAI HardwareOpen Source

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us