Rustane: First Open-Source Rust Training Engine for Apple Neural Engine Enables On-Device LLM Training Up to 5B Parameters

Key Takeaways

▸Rustane is the first open-source, training-capable Rust engine for Apple Neural Engine, using reverse-engineered private APIs to directly compile and execute transformer models
▸Training validation confirmed across 48M to 5B parameter models with forward-pass capability to 30B on M4 Max 128GB, consuming only 3-5W of power
▸Key architectural findings: efficiency cliff at dim=5120, architecture crossover at 3B (wide+shallow vs. deep+narrow), and RAM rather than chip specifications as the primary scaling limitation

Source:

Hacker Newshttps://github.com/ncdrone/rustane↗

Summary

A new open-source Rust-native engine called Rustane enables full training and inference of transformer models directly on Apple's Neural Engine (ANE) and Metal GPU, marking the first publicly available training-capable implementation for Apple's specialized hardware. The engine uses reverse-engineered private ANE APIs to compile and execute MIL kernels with remarkable efficiency, achieving training at just 3-5W power draw while leaving the GPU free for other tasks. Developers have validated the training pipeline across models ranging from 48M to 5B parameters on an M4 Max with 128GB RAM, with forward-pass capabilities confirmed up to 30B parameters, demonstrating that RAM—not the chip itself—is the primary limitation.

Rustane's architecture leverages memory-safe Rust to directly interface with Apple's undocumented _ANEClient and _ANEInMemoryModel APIs, bypassing CoreML's black-box scheduler. The project includes comprehensive benchmarking tools showing a 600M parameter model trains at ~535 tokens/second on ANE, with detailed results revealing architectural insights such as an efficiency cliff at dimension 5120 and a crossover point at 3B parameters where deeper, narrower architectures outperform wider, shallower ones. Trained weights export via SafeTensors for deployment anywhere, and the community is actively collecting performance data across different Apple Silicon chips to map the hardware's actual capabilities.

Trained weights export via SafeTensors and community benchmarking provides hardware-specific performance insights across Apple Silicon variants

Editorial Opinion

Rustane represents a significant democratization of on-device AI training on Apple Silicon, bypassing proprietary frameworks to unlock the Neural Engine's full potential. The reverse-engineered approach, while technically impressive, raises interesting questions about API accessibility and whether Apple might formalize these capabilities. The meticulous benchmarking and architectural insights provided by the project will likely influence future LLM design choices for Apple Silicon optimization.

Rustane: First Open-Source Rust Training Engine for Apple Neural Engine Enables On-Device LLM Training Up to 5B Parameters

Key Takeaways

▸Rustane is the first open-source, training-capable Rust engine for Apple Neural Engine, using reverse-engineered private APIs to directly compile and execute transformer models
▸Training validation confirmed across 48M to 5B parameter models with forward-pass capability to 30B on M4 Max 128GB, consuming only 3-5W of power
▸Key architectural findings: efficiency cliff at dim=5120, architecture crossover at 3B (wide+shallow vs. deep+narrow), and RAM rather than chip specifications as the primary scaling limitation

Summary

Trained weights export via SafeTensors and community benchmarking provides hardware-specific performance insights across Apple Silicon variants

Editorial Opinion

Rustane represents a significant democratization of on-device AI training on Apple Silicon, bypassing proprietary frameworks to unlock the Neural Engine's full potential. The reverse-engineered approach, while technically impressive, raises interesting questions about API accessibility and whether Apple might formalize these capabilities. The meticulous benchmarking and architectural insights provided by the project will likely influence future LLM design choices for Apple Silicon optimization.

Rustane: First Open-Source Rust Training Engine for Apple Neural Engine Enables On-Device LLM Training Up to 5B Parameters

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Meta Acknowledges AI Agent Development Slower Than Expected, Despite $145B Infrastructure Investment

Researchers Discover Six Vulnerabilities in Apple AirDrop and Google/Samsung Quick Share Protocols

Rustane: First Open-Source Rust Training Engine for Apple Neural Engine Enables On-Device LLM Training Up to 5B Parameters

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Meta Acknowledges AI Agent Development Slower Than Expected, Despite $145B Infrastructure Investment

Researchers Discover Six Vulnerabilities in Apple AirDrop and Google/Samsung Quick Share Protocols