Baidu Open-Sources LoongForge, High-Performance Training Framework with Up to 5× Speedup

Key Takeaways

▸Up to 5× training speedup over mainstream open-source baselines, with production deployments achieving 30–50% improvements
▸Unified framework supporting the complete training pipeline (pre-training, continued pre-training, SFT) for LLMs, VLMs, VLAs, and diffusion models
▸Native dual-hardware support for NVIDIA GPUs and Kunlun XPUs with advanced parallelism strategies and load balancing

Source:

Hacker Newshttps://github.com/baidu-baige/LoongForge↗

Summary

Baidu Baige has released LoongForge as open-source, a modular and scalable training framework for large language models (LLMs), vision-language models (VLMs), vision-language-action models (VLAs), and diffusion models. Built upon Megatron-LM with systemic enhancements, LoongForge delivers up to 5× training speedup over mainstream open-source baselines and natively supports both NVIDIA GPUs and Baidu's proprietary Kunlun XPUs.

The framework introduces several advanced optimization techniques including adaptive FP8 training for mixed-precision efficiency, decoupled encoder-decoder training to eliminate pipeline bottlenecks, MoE-native optimizations for large sparse models, and flexible checkpointing with seamless Megatron-HuggingFace format conversion. LoongForge's heterogeneous parallelism design allows independent tensor/data parallelism and recomputation strategies per model component, enabling optimal throughput and memory efficiency for complex multimodal architectures.

LoongForge builds on years of production refinement as Baidu's internal AIAK-Training-LLM stack, which has powered enterprise customers in education, computer vision, and embodied AI with typical 30–50% speedups and production deployments scaling to 5,000+ Kunlun XPUs. The v0.1.0 open-source release already supports recent high-impact model releases, including LLaVA-OneVision-2.0 and expanded VLA support for GR00T N1.6 with 60%+ training speedups.

Already integrated into production models and publicly available on GitHub with comprehensive documentation and tutorials

Editorial Opinion

LoongForge's open-source release democratizes access to production-grade training infrastructure that Baidu has refined through thousands of Kunlun XPU deployments. The framework's heterogeneous parallelism design and native support for both NVIDIA and custom XPU hardware make it a significant contribution to open-source training ecosystems, particularly valuable for teams scaling multimodal models. With proven impact on state-of-the-art releases and demonstrated efficiency gains over Megatron-LM, LoongForge could become essential infrastructure for next-generation model development.

Baidu Open-Sources LoongForge, High-Performance Training Framework with Up to 5× Speedup

Key Takeaways

▸Up to 5× training speedup over mainstream open-source baselines, with production deployments achieving 30–50% improvements
▸Unified framework supporting the complete training pipeline (pre-training, continued pre-training, SFT) for LLMs, VLMs, VLAs, and diffusion models
▸Native dual-hardware support for NVIDIA GPUs and Kunlun XPUs with advanced parallelism strategies and load balancing

Summary

Already integrated into production models and publicly available on GitHub with comprehensive documentation and tutorials

Editorial Opinion

LoongForge's open-source release democratizes access to production-grade training infrastructure that Baidu has refined through thousands of Kunlun XPU deployments. The framework's heterogeneous parallelism design and native support for both NVIDIA and custom XPU hardware make it a significant contribution to open-source training ecosystems, particularly valuable for teams scaling multimodal models. With proven impact on state-of-the-art releases and demonstrated efficiency gains over Megatron-LM, LoongForge could become essential infrastructure for next-generation model development.

Baidu Open-Sources LoongForge, High-Performance Training Framework with Up to 5× Speedup

Key Takeaways

Summary

Editorial Opinion

More from Baidu

Pentagon Sanctions Chinese AI Giants Alibaba, Baidu, Unitree on Military Support List

Baidu Launches ERNIE 5.1, Advancing Its Generative AI Capabilities

China Suspends New Autonomous Vehicle Licenses After Baidu Robotaxi Incident

Comments

Suggested

Google Cloud Strengthens Agentic AI Security with Enhanced VPC Service Controls

Cloudflare Launches Agentic Inbox: Self-Hosted Email Client with Built-In AI Agent

Stanford Researchers Advance HIP Kernel Generation Using Multi-Agent AI and Reinforcement Learning

Baidu Open-Sources LoongForge, High-Performance Training Framework with Up to 5× Speedup

Key Takeaways

Summary

Editorial Opinion

More from Baidu

Pentagon Sanctions Chinese AI Giants Alibaba, Baidu, Unitree on Military Support List

Baidu Launches ERNIE 5.1, Advancing Its Generative AI Capabilities

China Suspends New Autonomous Vehicle Licenses After Baidu Robotaxi Incident

Comments

Suggested

Google Cloud Strengthens Agentic AI Security with Enhanced VPC Service Controls

Cloudflare Launches Agentic Inbox: Self-Hosted Email Client with Built-In AI Agent

Stanford Researchers Advance HIP Kernel Generation Using Multi-Agent AI and Reinforcement Learning