RadixArk Achieves Thousand-Scale LoRA Adapter Training with Extended Miles Framework

Key Takeaways

▸Successfully trains 1,536 LoRA adapters concurrently on a single base model with sub-3-minute training steps
▸Eliminates VRAM duplication by sharing frozen base model and routing tokens to different lightweight task-specific adapters
▸Enables efficient large-scale RL experimentation: thousands of policy variants can now be tested and compared within the same training loop

Source:

Hacker Newshttps://osmosis.ai/blogs/training-thousands-of-lora-adapters-at-once↗

Summary

RadixArk has extended Miles, its open-source RL post-training framework, with a multi-adapter LoRA training system that enables concurrent training of thousands of LoRA adapters on a single shared base model. By modifying Megatron-Bridge and implementing multi-LoRA routing through SGLang, the team demonstrates capability to train 1,536 LoRA adapter instances simultaneously with step times under 3 minutes on a Qwen3.6-35B model, validating the approach on GSM8K benchmarks.

The breakthrough addresses a critical infrastructure bottleneck in scaling RL experiments: traditionally, training multiple LoRA adapters requires replicating the entire base model for each concurrent run, wasting substantial VRAM. The new approach shares a single base model across all adapters while routing tokens to different task-specific LoRA deltas, enabling researchers to explore thousands of policy variations (prompt design, reward signals, curriculum ablations) within a single training step.

Implementation details include online adapter loading and unloading without trainer restarts, multi-LoRA rollouts via SGLang's native interface, unified FP8 training support, and memory optimization through adapter-free expert design. This architectural approach transforms LoRA from a single-policy fine-tuning technique into a platform for large-scale parallel policy exploration.

Built on Megatron-Bridge and SGLang with online adapter lifecycle management and memory optimization for expert layers

Editorial Opinion

This is a meaningful systems contribution that democratizes large-scale RL policy exploration. The elegance of sharing a frozen base model while routing through lightweight task-specific adapters should become standard practice for RL infrastructure. For teams exploring extensive hyperparameter and design spaces—common in frontier model training—this reduces compute waste significantly. However, adoption and real-world impact depend on community uptake of Miles and validation beyond the GSM8K stress test on production RL workloads.

RadixArk Achieves Thousand-Scale LoRA Adapter Training with Extended Miles Framework

Key Takeaways

▸Successfully trains 1,536 LoRA adapters concurrently on a single base model with sub-3-minute training steps
▸Eliminates VRAM duplication by sharing frozen base model and routing tokens to different lightweight task-specific adapters
▸Enables efficient large-scale RL experimentation: thousands of policy variants can now be tested and compared within the same training loop

Summary

Built on Megatron-Bridge and SGLang with online adapter lifecycle management and memory optimization for expert layers

Editorial Opinion

This is a meaningful systems contribution that democratizes large-scale RL policy exploration. The elegance of sharing a frozen base model while routing through lightweight task-specific adapters should become standard practice for RL infrastructure. For teams exploring extensive hyperparameter and design spaces—common in frontier model training—this reduces compute waste significantly. However, adoption and real-world impact depend on community uptake of Miles and validation beyond the GSM8K stress test on production RL workloads.

RadixArk Achieves Thousand-Scale LoRA Adapter Training with Extended Miles Framework

Key Takeaways

Summary

Editorial Opinion

More from RadixArk

RadixArk Launches Miles: Open-Source Framework for Production-Scale LLM Reinforcement Learning

Comments

Suggested

Netflix Reveals In-House LLM Serving Strategy: Building Full-Stack Inference Infrastructure

Meta in Advanced Talks to Lease Computing Power to Anthropic in Potential $10B Infrastructure Deal

Perplexity Launches SPACE: A Security-First Sandbox for Long-Running AI Agents

RadixArk Achieves Thousand-Scale LoRA Adapter Training with Extended Miles Framework

Key Takeaways

Summary

Editorial Opinion

More from RadixArk

RadixArk Launches Miles: Open-Source Framework for Production-Scale LLM Reinforcement Learning

Comments

Suggested

Netflix Reveals In-House LLM Serving Strategy: Building Full-Stack Inference Infrastructure

Meta in Advanced Talks to Lease Computing Power to Anthropic in Potential $10B Infrastructure Deal

Perplexity Launches SPACE: A Security-First Sandbox for Long-Running AI Agents