BotBeat
...
← Back

> ▌

NVIDIANVIDIA
PRODUCT LAUNCHNVIDIA2026-03-17

Nvidia Integrates Groq 3 LPU into Rubin Platform to Boost AI Inference Performance

Key Takeaways

  • ▸Groq 3 LPU features 500 MB of SRAM with 150 TB/s bandwidth, optimized for low-latency inference operations
  • ▸Groq LPX racks will contain 256 Groq 3 LPUs, delivering 40 PB/s aggregate bandwidth for inference acceleration
  • ▸The integration targets multi-agent AI systems requiring sub-second responsiveness and high-throughput AI-to-AI communication
Source:
Hacker Newshttps://www.tomshardware.com/pc-components/gpus/nvidia-groq-3-lpu-and-groq-lpx-racks-join-rubin-platform-at-gtc-sram-packed-accelerator-boosts-every-layer-of-the-ai-model-on-every-token↗

Summary

Nvidia has integrated the Groq 3 LPU inference accelerator into its Vera Rubin platform, expanding the system's capabilities for next-generation AI data centers. The Groq 3 LPU is distinguished by its 500 MB of SRAM offering 150 TB/s of bandwidth, significantly higher than traditional HBM-based accelerators, making it ideal for low-latency inference workloads. Nvidia will build Groq LPX racks containing 256 Groq 3 LPUs, providing 128GB of total SRAM with 40 PB/s of bandwidth and 640 TB/s of dedicated scale-up connectivity.

The addition positions Rubin to handle the emerging frontier of multi-agent AI systems that require high-speed intercommunication between AI agents. According to Nvidia's hyperscale VP Ian Buck, the combination of Rubin GPUs and Groq LPUs will enable throughput of 1,500 tokens per second or higher for AI agent interactions, a dramatic increase from the 100 tokens per second typical for human-facing applications. This development directly addresses competition from Cerebras and other low-latency inference specialists, strengthening Nvidia's position across the expanding AI infrastructure market.

  • Rubin platform now includes seven major components spanning compute, networking, and inference acceleration across CPU, GPU, and LPU architectures

Editorial Opinion

Nvidia's acquisition and integration of Groq's SRAM-centric LPU technology into Rubin demonstrates a strategic response to growing specialization in AI inference markets. By combining high-bandwidth SRAM accelerators with its GPU-centric platform, Nvidia is positioning itself to serve both training and specialized inference workloads within a unified ecosystem. However, the shift toward multi-agent systems and the emphasis on AI-to-AI communication represents a fundamental evolution in how enterprise AI infrastructure will be designed and optimized.

Large Language Models (LLMs)Generative AIAI HardwareMergers & Acquisitions

More from NVIDIA

NVIDIANVIDIA
FUNDING & BUSINESS

NVIDIA Reports Record $81.6B Revenue in Q1 FY2027, Data Center Segment Surges 92% YoY

2026-05-20
NVIDIANVIDIA
POLICY & REGULATION

China Bans Nvidia RTX 5090D V2 During CEO Huang's Visit, Escalating AI Hardware Trade War

2026-05-20
NVIDIANVIDIA
PRODUCT LAUNCH

GTAP Enables Transparent Remote GPU Access: Ollama Runs on MacBook with Remote Blackwell GPU

2026-05-20

Comments

Suggested

AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
Generative AIGenerative AI
INDUSTRY REPORT

Barnes & Noble CEO Backs Selling AI-Written Books, Sparking Industry Debate on Transparency Standards

2026-05-20
NVIDIANVIDIA
FUNDING & BUSINESS

NVIDIA Reports Record $81.6B Revenue in Q1 FY2027, Data Center Segment Surges 92% YoY

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us