BotBeat
...
← Back

> ▌

NVIDIANVIDIA
PRODUCT LAUNCHNVIDIA2026-03-17

Nvidia Integrates Groq 3 LPU into Rubin Platform to Boost AI Inference Performance

Key Takeaways

  • ▸Groq 3 LPU features 500 MB of SRAM with 150 TB/s bandwidth, optimized for low-latency inference operations
  • ▸Groq LPX racks will contain 256 Groq 3 LPUs, delivering 40 PB/s aggregate bandwidth for inference acceleration
  • ▸The integration targets multi-agent AI systems requiring sub-second responsiveness and high-throughput AI-to-AI communication
Source:
Hacker Newshttps://www.tomshardware.com/pc-components/gpus/nvidia-groq-3-lpu-and-groq-lpx-racks-join-rubin-platform-at-gtc-sram-packed-accelerator-boosts-every-layer-of-the-ai-model-on-every-token↗

Summary

Nvidia has integrated the Groq 3 LPU inference accelerator into its Vera Rubin platform, expanding the system's capabilities for next-generation AI data centers. The Groq 3 LPU is distinguished by its 500 MB of SRAM offering 150 TB/s of bandwidth, significantly higher than traditional HBM-based accelerators, making it ideal for low-latency inference workloads. Nvidia will build Groq LPX racks containing 256 Groq 3 LPUs, providing 128GB of total SRAM with 40 PB/s of bandwidth and 640 TB/s of dedicated scale-up connectivity.

The addition positions Rubin to handle the emerging frontier of multi-agent AI systems that require high-speed intercommunication between AI agents. According to Nvidia's hyperscale VP Ian Buck, the combination of Rubin GPUs and Groq LPUs will enable throughput of 1,500 tokens per second or higher for AI agent interactions, a dramatic increase from the 100 tokens per second typical for human-facing applications. This development directly addresses competition from Cerebras and other low-latency inference specialists, strengthening Nvidia's position across the expanding AI infrastructure market.

  • Rubin platform now includes seven major components spanning compute, networking, and inference acceleration across CPU, GPU, and LPU architectures

Editorial Opinion

Nvidia's acquisition and integration of Groq's SRAM-centric LPU technology into Rubin demonstrates a strategic response to growing specialization in AI inference markets. By combining high-bandwidth SRAM accelerators with its GPU-centric platform, Nvidia is positioning itself to serve both training and specialized inference workloads within a unified ecosystem. However, the shift toward multi-agent systems and the emphasis on AI-to-AI communication represents a fundamental evolution in how enterprise AI infrastructure will be designed and optimized.

Large Language Models (LLMs)Generative AIAI HardwareMergers & Acquisitions

More from NVIDIA

NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Launches Cloud Functions Platform for GPU-Accelerated Workload Deployment at Scale

2026-07-03
NVIDIANVIDIA
RESEARCH

NVIDIA Launches Blackwell GPU Optimization Series: First Comprehensive Guide to Matrix Multiplication Kernels

2026-07-02
NVIDIANVIDIA
POLICY & REGULATION

Singapore Seizes $42M Mansion in NVIDIA Chip Smuggling Crackdown

2026-07-02

Comments

Suggested

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
Google / AlphabetGoogle / Alphabet
RESEARCH

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

2026-07-04
OpenAIOpenAI
INDUSTRY REPORT

Investigation Uncovers AI-Generated Deepfakes in Lily Jay Foundation Charity Fraud

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us