Meta Reveals Backend Aggregation Technology Powering Gigawatt-Scale AI Clusters Like Prometheus
Key Takeaways
- ▸Backend Aggregation enables Meta to connect tens of thousands of GPUs across multiple data centers with petabit-range bandwidth capacity
- ▸Prometheus cluster will deliver 1 gigawatt of AI computing capacity spanning multiple data center buildings in a single region
- ▸BAG uses modular Jericho3 ASIC line cards, eBGP routing with UCMP, and MACsec security to ensure scalable, performant, and resilient interconnection
Summary
Meta has disclosed technical details about Backend Aggregation (BAG), a critical networking technology enabling the company to seamlessly connect tens of thousands of GPUs across multiple data centers and regions. BAG functions as a centralized Ethernet-based super spine network layer that interconnects multiple fabric layers, with inter-BAG capacities reaching the petabit range (16-48 Pbps per region pair). The technology is central to Meta's Prometheus AI cluster project, which will deliver 1 gigawatt of computational capacity spanning several data center buildings and interconnecting tens of thousands of GPUs to power new and existing AI experiences across Meta's product ecosystem.
Meta's BAG implementation connects two different network fabrics—Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF)—using modular hardware, advanced routing, and resilient topologies to ensure both performance and reliability at unprecedented scale. The system employs Jericho3 ASIC line cards with up to 432x800G ports, eBGP routing with Unequal Cost Multipath (UCMP) for load balancing, and MACsec encryption for security. As Meta's AI clusters continue to grow, the company expects BAG to play an increasingly important role in meeting future computational demands and driving innovation across its global network infrastructure.
- Meta strategically distributes BAG layers regionally with oversubscription ratios around 4.5:1 (L2 to BAG) to balance scale and performance
Editorial Opinion
Meta's disclosure of Backend Aggregation technology demonstrates the critical importance of networking infrastructure in supporting next-generation AI systems at scale. As AI clusters grow to gigawatt-scale capacity, the traditional networking approaches become insufficient, and specialized solutions like BAG become essential differentiators. This technical innovation highlights how hardware-software co-design and careful engineering of interconnect topologies are as crucial to AI infrastructure as the compute itself, potentially informing how other hyperscalers will need to architect their own future AI clusters.



