BotBeat
...
← Back

> ▌

BaiduBaidu
OPEN SOURCEBaidu2026-05-21

Baidu Open-Sources LoongForge, High-Performance Training Framework with Up to 5× Speedup

Key Takeaways

  • ▸Up to 5× training speedup over mainstream open-source baselines, with production deployments achieving 30–50% improvements
  • ▸Unified framework supporting the complete training pipeline (pre-training, continued pre-training, SFT) for LLMs, VLMs, VLAs, and diffusion models
  • ▸Native dual-hardware support for NVIDIA GPUs and Kunlun XPUs with advanced parallelism strategies and load balancing
Source:
Hacker Newshttps://github.com/baidu-baige/LoongForge↗

Summary

Baidu Baige has released LoongForge as open-source, a modular and scalable training framework for large language models (LLMs), vision-language models (VLMs), vision-language-action models (VLAs), and diffusion models. Built upon Megatron-LM with systemic enhancements, LoongForge delivers up to 5× training speedup over mainstream open-source baselines and natively supports both NVIDIA GPUs and Baidu's proprietary Kunlun XPUs.

The framework introduces several advanced optimization techniques including adaptive FP8 training for mixed-precision efficiency, decoupled encoder-decoder training to eliminate pipeline bottlenecks, MoE-native optimizations for large sparse models, and flexible checkpointing with seamless Megatron-HuggingFace format conversion. LoongForge's heterogeneous parallelism design allows independent tensor/data parallelism and recomputation strategies per model component, enabling optimal throughput and memory efficiency for complex multimodal architectures.

LoongForge builds on years of production refinement as Baidu's internal AIAK-Training-LLM stack, which has powered enterprise customers in education, computer vision, and embodied AI with typical 30–50% speedups and production deployments scaling to 5,000+ Kunlun XPUs. The v0.1.0 open-source release already supports recent high-impact model releases, including LLaVA-OneVision-2.0 and expanded VLA support for GR00T N1.6 with 60%+ training speedups.

  • Already integrated into production models and publicly available on GitHub with comprehensive documentation and tutorials

Editorial Opinion

LoongForge's open-source release democratizes access to production-grade training infrastructure that Baidu has refined through thousands of Kunlun XPU deployments. The framework's heterogeneous parallelism design and native support for both NVIDIA and custom XPU hardware make it a significant contribution to open-source training ecosystems, particularly valuable for teams scaling multimodal models. With proven impact on state-of-the-art releases and demonstrated efficiency gains over Megatron-LM, LoongForge could become essential infrastructure for next-generation model development.

Large Language Models (LLMs)Generative AIMultimodal AIMLOps & InfrastructureOpen Source

More from Baidu

BaiduBaidu
PRODUCT LAUNCH

Baidu Launches ERNIE 5.1, Advancing Its Generative AI Capabilities

2026-05-11
BaiduBaidu
POLICY & REGULATION

China Suspends New Autonomous Vehicle Licenses After Baidu Robotaxi Incident

2026-04-29
BaiduBaidu
INDUSTRY REPORT

Baidu Faces Setback as 100 Robotaxis Reportedly Malfunction in Wuhan

2026-04-06

Comments

Suggested

LovableLovable
INDUSTRY REPORT

Lovable's AI-Built Website Scores 100% on Accessibility Tests—But Fails Real-World Screen Reader Testing

2026-05-21
DellDell
INDUSTRY REPORT

Dell Pivots to On-Premises AI Infrastructure as Enterprises Retreat from Cloud Lock-In

2026-05-21
Google / AlphabetGoogle / Alphabet
RESEARCH

Google Researchers Win WWW 2024 Best Paper Award for LLM Mechanism Design Framework

2026-05-21
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us