BotBeat
...
← Back

> ▌

RadixArkRadixArk
RESEARCHRadixArk2026-06-02

RadixArk Achieves Thousand-Scale LoRA Adapter Training with Extended Miles Framework

Key Takeaways

  • ▸Successfully trains 1,536 LoRA adapters concurrently on a single base model with sub-3-minute training steps
  • ▸Eliminates VRAM duplication by sharing frozen base model and routing tokens to different lightweight task-specific adapters
  • ▸Enables efficient large-scale RL experimentation: thousands of policy variants can now be tested and compared within the same training loop
Source:
Hacker Newshttps://osmosis.ai/blogs/training-thousands-of-lora-adapters-at-once↗

Summary

RadixArk has extended Miles, its open-source RL post-training framework, with a multi-adapter LoRA training system that enables concurrent training of thousands of LoRA adapters on a single shared base model. By modifying Megatron-Bridge and implementing multi-LoRA routing through SGLang, the team demonstrates capability to train 1,536 LoRA adapter instances simultaneously with step times under 3 minutes on a Qwen3.6-35B model, validating the approach on GSM8K benchmarks.

The breakthrough addresses a critical infrastructure bottleneck in scaling RL experiments: traditionally, training multiple LoRA adapters requires replicating the entire base model for each concurrent run, wasting substantial VRAM. The new approach shares a single base model across all adapters while routing tokens to different task-specific LoRA deltas, enabling researchers to explore thousands of policy variations (prompt design, reward signals, curriculum ablations) within a single training step.

Implementation details include online adapter loading and unloading without trainer restarts, multi-LoRA rollouts via SGLang's native interface, unified FP8 training support, and memory optimization through adapter-free expert design. This architectural approach transforms LoRA from a single-policy fine-tuning technique into a platform for large-scale parallel policy exploration.

  • Built on Megatron-Bridge and SGLang with online adapter lifecycle management and memory optimization for expert layers

Editorial Opinion

This is a meaningful systems contribution that democratizes large-scale RL policy exploration. The elegance of sharing a frozen base model while routing through lightweight task-specific adapters should become standard practice for RL infrastructure. For teams exploring extensive hyperparameter and design spaces—common in frontier model training—this reduces compute waste significantly. However, adoption and real-world impact depend on community uptake of Miles and validation beyond the GSM8K stress test on production RL workloads.

Large Language Models (LLMs)Reinforcement LearningMachine LearningMLOps & InfrastructureOpen Source

Comments

Suggested

MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft Launches MAI-Thinking-1 and Six Specialized AI Models at Build 2026

2026-06-02
CanonicalCanonical
PRODUCT LAUNCH

Canonical Launches Ubuntu 26.04 as the Operating System for the AI Agentic Era

2026-06-02
MicrosoftMicrosoft
PRODUCT LAUNCH

Microsoft AI Launches Seven New MAI Models, Introducing 'Hill-Climbing' Approach to Frontier AI

2026-06-02
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us