AI2 Introduces BAR: Modular Post-Training Framework for Efficient Model Updates Using Mixture-of-Experts
Key Takeaways
- ▸BAR enables independent development and upgrading of domain experts without affecting other capabilities, reducing post-training iteration costs
- ▸A progressive unfreezing schedule for shared parameters across training stages proves critical, as post-training requires behavioral shifts beyond what frozen parameters can support
- ▸The framework maintains general capabilities by training each expert on a mixture of domain-specific and general data, preventing domain-only specialization from degrading baseline performance
Summary
Allen Institute for AI (AI2) has unveiled BAR (Branch-Adapt-Route), a novel framework for modular post-training that addresses a critical challenge in language model development: efficiently updating or extending models after pretraining without losing existing capabilities. Traditional approaches require either expensive full retraining from scratch or risk catastrophic forgetting when adding new skills through continued training. BAR sidesteps these issues by training independent domain experts through separate pipelines, each customized for specific capabilities like math, code, tool use, and safety, then composing them into a unified model via a mixture-of-experts (MoE) architecture.
The framework builds on AI2's earlier work with FlexOlmo, which demonstrated modular MoE training for pretraining but failed for post-training due to the different nature of post-training—which requires behavioral shifts affecting shared parameters like attention layers. BAR introduces a progressive unfreezing schedule that strategically thaws different shared layers during different training stages: keeping all shared parameters frozen during mid-training, selectively unfreezing embeddings and the language modeling head during supervised fine-tuning, and fully unfreezing all parameters during reinforcement learning. This approach demonstrated significant improvements, such as tool-use expert performance jumping from 20.3 to 46.4 on the Berkeley Function Calling Leaderboard when embeddings were unfrozen.
- AI2 is releasing the complete recipe, technical report, and model checkpoints to enable broader adoption of modular post-training approaches
Editorial Opinion
BAR represents a meaningful step forward in making language model development more practical and efficient by decoupling specialized capability training from the fragile full-retraining requirement. The progressive unfreezing insight is particularly valuable—it elegantly captures the intuition that different training stages require different levels of flexibility in shared parameters. However, the approach's scalability to many more than four concurrent experts and its performance overhead during inference remain open questions that the community will need to address.



