SRAM-Centric Chips Gain Momentum in AI Inference as NVIDIA Licenses Groq IP for $20B

Key Takeaways

▸NVIDIA's $20B licensing of Groq IP and Cerebras' 750 MW OpenAI deal signal mainstream adoption of SRAM-centric accelerators for AI inference
▸SRAM offers 375-1000x faster memory access than HBM but is significantly less dense, creating a fundamental tradeoff between speed and capacity
▸Arithmetic intensity and working set size determine whether SRAM-centric or GPU architectures perform better for specific inference workloads

Source:

Hacker Newshttps://gimletlabs.ai/blog/sram-centric-chips↗

Summary

SRAM-centric AI accelerators are emerging as serious competitors to traditional GPUs for inference workloads, following major developments including NVIDIA's $20 billion licensing deal for Groq's IP in December 2025 and Cerebras securing a 750 MW contract to run OpenAI inference workloads. Companies like Cerebras, Groq, and d-Matrix are pioneering architectures that prioritize on-chip SRAM memory over traditional off-chip HBM (High Bandwidth Memory), offering significant latency and throughput advantages for certain AI inference tasks.

The fundamental difference lies in memory architecture: SRAM provides sub-nanosecond access times and sits directly on-chip with compute cores, while HBM (a form of DRAM) requires 375-500 nanoseconds to access and lives off-chip. SRAM uses six transistors per bit versus DRAM's one transistor and capacitor, making it faster but less dense and more expensive. This creates a critical tradeoff between "near-compute" memory (SRAM) and "far-compute" memory (HBM) that determines optimal hardware for different workloads.

According to Gimlet Labs, which operates a multi-silicon inference cloud deploying both GPU and SRAM-centric architectures, the key determining factor is arithmetic intensity—the ratio of compute operations to memory accesses. Workloads with smaller working sets and higher arithmetic intensity benefit from SRAM-centric designs, while larger working sets requiring massive memory capacity still favor GPU architectures with abundant HBM. The industry is expected to see continued specialization, with new memory technologies emerging to bridge the gap between SRAM and HBM approaches as AI inference demands continue to evolve.

The industry is moving toward specialized hardware, with SRAM-centric chips capturing inference tasks requiring low latency and high throughput
New memory technologies are expected to emerge that bridge the performance gap between on-chip SRAM and off-chip HBM

Editorial Opinion

The $20 billion Groq licensing deal represents a watershed moment for alternative AI chip architectures, validating that the GPU monopoly on AI compute is genuinely vulnerable. What's particularly noteworthy is how this shift is being driven by inference economics rather than training—as models grow and inference costs dominate, the latency advantages of near-compute memory become financially compelling. The emergence of workload-specific optimization, where different silicon handles different tasks, suggests we're entering an era of heterogeneous AI infrastructure rather than one-size-fits-all solutions.

SRAM-Centric Chips Gain Momentum in AI Inference as NVIDIA Licenses Groq IP for $20B

Key Takeaways

▸NVIDIA's $20B licensing of Groq IP and Cerebras' 750 MW OpenAI deal signal mainstream adoption of SRAM-centric accelerators for AI inference
▸SRAM offers 375-1000x faster memory access than HBM but is significantly less dense, creating a fundamental tradeoff between speed and capacity
▸Arithmetic intensity and working set size determine whether SRAM-centric or GPU architectures perform better for specific inference workloads

Summary

The industry is moving toward specialized hardware, with SRAM-centric chips capturing inference tasks requiring low latency and high throughput
New memory technologies are expected to emerge that bridge the performance gap between on-chip SRAM and off-chip HBM

Editorial Opinion

The $20 billion Groq licensing deal represents a watershed moment for alternative AI chip architectures, validating that the GPU monopoly on AI compute is genuinely vulnerable. What's particularly noteworthy is how this shift is being driven by inference economics rather than training—as models grow and inference costs dominate, the latency advantages of near-compute memory become financially compelling. The emergence of workload-specific optimization, where different silicon handles different tasks, suggests we're entering an era of heterogeneous AI infrastructure rather than one-size-fits-all solutions.

SRAM-Centric Chips Gain Momentum in AI Inference as NVIDIA Licenses Groq IP for $20B

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Launches Cloud Functions Platform for GPU-Accelerated Workload Deployment at Scale

NVIDIA Launches Blackwell GPU Optimization Series: First Comprehensive Guide to Matrix Multiplication Kernels

Singapore Seizes $42M Mansion in NVIDIA Chip Smuggling Crackdown

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

First Large-Scale Study Shows AI Adoption Drives Job Growth, Not Displacement

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

SRAM-Centric Chips Gain Momentum in AI Inference as NVIDIA Licenses Groq IP for $20B

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Launches Cloud Functions Platform for GPU-Accelerated Workload Deployment at Scale

NVIDIA Launches Blackwell GPU Optimization Series: First Comprehensive Guide to Matrix Multiplication Kernels

Singapore Seizes $42M Mansion in NVIDIA Chip Smuggling Crackdown

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

First Large-Scale Study Shows AI Adoption Drives Job Growth, Not Displacement

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains