Meta Reveals Backend Aggregation Technology Powering Gigawatt-Scale AI Clusters Like Prometheus

Key Takeaways

▸Backend Aggregation enables Meta to connect tens of thousands of GPUs across multiple data centers with petabit-range bandwidth capacity
▸Prometheus cluster will deliver 1 gigawatt of AI computing capacity spanning multiple data center buildings in a single region
▸BAG uses modular Jericho3 ASIC line cards, eBGP routing with UCMP, and MACsec security to ensure scalable, performant, and resilient interconnection

Source:

Hacker Newshttps://engineering.fb.com/2026/02/09/data-center-engineering/building-prometheus-how-backend-aggregation-enables-gigawatt-scale-ai-clusters/↗

Summary

Meta has disclosed technical details about Backend Aggregation (BAG), a critical networking technology enabling the company to seamlessly connect tens of thousands of GPUs across multiple data centers and regions. BAG functions as a centralized Ethernet-based super spine network layer that interconnects multiple fabric layers, with inter-BAG capacities reaching the petabit range (16-48 Pbps per region pair). The technology is central to Meta's Prometheus AI cluster project, which will deliver 1 gigawatt of computational capacity spanning several data center buildings and interconnecting tens of thousands of GPUs to power new and existing AI experiences across Meta's product ecosystem.

Meta's BAG implementation connects two different network fabrics—Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF)—using modular hardware, advanced routing, and resilient topologies to ensure both performance and reliability at unprecedented scale. The system employs Jericho3 ASIC line cards with up to 432x800G ports, eBGP routing with Unequal Cost Multipath (UCMP) for load balancing, and MACsec encryption for security. As Meta's AI clusters continue to grow, the company expects BAG to play an increasingly important role in meeting future computational demands and driving innovation across its global network infrastructure.

Meta strategically distributes BAG layers regionally with oversubscription ratios around 4.5:1 (L2 to BAG) to balance scale and performance

Editorial Opinion

Meta's disclosure of Backend Aggregation technology demonstrates the critical importance of networking infrastructure in supporting next-generation AI systems at scale. As AI clusters grow to gigawatt-scale capacity, the traditional networking approaches become insufficient, and specialized solutions like BAG become essential differentiators. This technical innovation highlights how hardware-software co-design and careful engineering of interconnect topologies are as crucial to AI infrastructure as the compute itself, potentially informing how other hyperscalers will need to architect their own future AI clusters.

Meta Reveals Backend Aggregation Technology Powering Gigawatt-Scale AI Clusters Like Prometheus

Key Takeaways

▸Backend Aggregation enables Meta to connect tens of thousands of GPUs across multiple data centers with petabit-range bandwidth capacity
▸Prometheus cluster will deliver 1 gigawatt of AI computing capacity spanning multiple data center buildings in a single region
▸BAG uses modular Jericho3 ASIC line cards, eBGP routing with UCMP, and MACsec security to ensure scalable, performant, and resilient interconnection

Summary

Meta strategically distributes BAG layers regionally with oversubscription ratios around 4.5:1 (L2 to BAG) to balance scale and performance

Editorial Opinion

Meta's disclosure of Backend Aggregation technology demonstrates the critical importance of networking infrastructure in supporting next-generation AI systems at scale. As AI clusters grow to gigawatt-scale capacity, the traditional networking approaches become insufficient, and specialized solutions like BAG become essential differentiators. This technical innovation highlights how hardware-software co-design and careful engineering of interconnect topologies are as crucial to AI infrastructure as the compute itself, potentially informing how other hyperscalers will need to architect their own future AI clusters.

Meta Reveals Backend Aggregation Technology Powering Gigawatt-Scale AI Clusters Like Prometheus

Key Takeaways

Summary

Editorial Opinion

More from Meta

Meta Begins Laying Off Thousands of Employees as It Transforms Around AI

Meta Introduces MLX Delegate for GPU-Accelerated PyTorch Inference on Apple Silicon

The Hidden Costs of Scale: Why Advanced LLM Training Remains Precarious

Comments

Suggested

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

China Bans Nvidia RTX 5090D V2 During CEO Huang's Visit, Escalating AI Hardware Trade War

Meta Reveals Backend Aggregation Technology Powering Gigawatt-Scale AI Clusters Like Prometheus

Key Takeaways

Summary

Editorial Opinion

More from Meta

Meta Begins Laying Off Thousands of Employees as It Transforms Around AI

Meta Introduces MLX Delegate for GPU-Accelerated PyTorch Inference on Apple Silicon

The Hidden Costs of Scale: Why Advanced LLM Training Remains Precarious

Comments

Suggested

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

China Bans Nvidia RTX 5090D V2 During CEO Huang's Visit, Escalating AI Hardware Trade War