BotBeat
...
← Back

> ▌

BaiduBaidu
OPEN SOURCEBaidu2026-05-21

Baidu Open-Sources LoongForge, High-Performance Training Framework with Up to 5× Speedup

Key Takeaways

  • ▸Up to 5× training speedup over mainstream open-source baselines, with production deployments achieving 30–50% improvements
  • ▸Unified framework supporting the complete training pipeline (pre-training, continued pre-training, SFT) for LLMs, VLMs, VLAs, and diffusion models
  • ▸Native dual-hardware support for NVIDIA GPUs and Kunlun XPUs with advanced parallelism strategies and load balancing
Source:
Hacker Newshttps://github.com/baidu-baige/LoongForge↗

Summary

Baidu Baige has released LoongForge as open-source, a modular and scalable training framework for large language models (LLMs), vision-language models (VLMs), vision-language-action models (VLAs), and diffusion models. Built upon Megatron-LM with systemic enhancements, LoongForge delivers up to 5× training speedup over mainstream open-source baselines and natively supports both NVIDIA GPUs and Baidu's proprietary Kunlun XPUs.

The framework introduces several advanced optimization techniques including adaptive FP8 training for mixed-precision efficiency, decoupled encoder-decoder training to eliminate pipeline bottlenecks, MoE-native optimizations for large sparse models, and flexible checkpointing with seamless Megatron-HuggingFace format conversion. LoongForge's heterogeneous parallelism design allows independent tensor/data parallelism and recomputation strategies per model component, enabling optimal throughput and memory efficiency for complex multimodal architectures.

LoongForge builds on years of production refinement as Baidu's internal AIAK-Training-LLM stack, which has powered enterprise customers in education, computer vision, and embodied AI with typical 30–50% speedups and production deployments scaling to 5,000+ Kunlun XPUs. The v0.1.0 open-source release already supports recent high-impact model releases, including LLaVA-OneVision-2.0 and expanded VLA support for GR00T N1.6 with 60%+ training speedups.

  • Already integrated into production models and publicly available on GitHub with comprehensive documentation and tutorials

Editorial Opinion

LoongForge's open-source release democratizes access to production-grade training infrastructure that Baidu has refined through thousands of Kunlun XPU deployments. The framework's heterogeneous parallelism design and native support for both NVIDIA and custom XPU hardware make it a significant contribution to open-source training ecosystems, particularly valuable for teams scaling multimodal models. With proven impact on state-of-the-art releases and demonstrated efficiency gains over Megatron-LM, LoongForge could become essential infrastructure for next-generation model development.

Large Language Models (LLMs)Generative AIMultimodal AIMLOps & InfrastructureOpen Source

More from Baidu

BaiduBaidu
POLICY & REGULATION

Pentagon Sanctions Chinese AI Giants Alibaba, Baidu, Unitree on Military Support List

2026-06-09
BaiduBaidu
PRODUCT LAUNCH

Baidu Launches ERNIE 5.1, Advancing Its Generative AI Capabilities

2026-05-11
BaiduBaidu
POLICY & REGULATION

China Suspends New Autonomous Vehicle Licenses After Baidu Robotaxi Incident

2026-04-29

Comments

Suggested

CloudflareCloudflare
OPEN SOURCE

Cloudflare Launches Agentic Inbox: Self-Hosted Email Client with Built-In AI Agent

2026-07-05
Stanford UniversityStanford University
RESEARCH

Stanford Researchers Advance HIP Kernel Generation Using Multi-Agent AI and Reinforcement Learning

2026-07-05
MidjourneyMidjourney
RESEARCH

Midjourney and Other AI Image Generators Perpetuate Global Stereotypes, Analysis Reveals

2026-07-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us