BotBeat
...
← Back

> ▌

NVIDIANVIDIA
INDUSTRY REPORTNVIDIA2026-03-06

SRAM-Centric Chips Gain Momentum in AI Inference as NVIDIA Licenses Groq IP for $20B

Key Takeaways

  • ▸NVIDIA's $20B licensing of Groq IP and Cerebras' 750 MW OpenAI deal signal mainstream adoption of SRAM-centric accelerators for AI inference
  • ▸SRAM offers 375-1000x faster memory access than HBM but is significantly less dense, creating a fundamental tradeoff between speed and capacity
  • ▸Arithmetic intensity and working set size determine whether SRAM-centric or GPU architectures perform better for specific inference workloads
Source:
Hacker Newshttps://gimletlabs.ai/blog/sram-centric-chips↗

Summary

SRAM-centric AI accelerators are emerging as serious competitors to traditional GPUs for inference workloads, following major developments including NVIDIA's $20 billion licensing deal for Groq's IP in December 2025 and Cerebras securing a 750 MW contract to run OpenAI inference workloads. Companies like Cerebras, Groq, and d-Matrix are pioneering architectures that prioritize on-chip SRAM memory over traditional off-chip HBM (High Bandwidth Memory), offering significant latency and throughput advantages for certain AI inference tasks.

The fundamental difference lies in memory architecture: SRAM provides sub-nanosecond access times and sits directly on-chip with compute cores, while HBM (a form of DRAM) requires 375-500 nanoseconds to access and lives off-chip. SRAM uses six transistors per bit versus DRAM's one transistor and capacitor, making it faster but less dense and more expensive. This creates a critical tradeoff between "near-compute" memory (SRAM) and "far-compute" memory (HBM) that determines optimal hardware for different workloads.

According to Gimlet Labs, which operates a multi-silicon inference cloud deploying both GPU and SRAM-centric architectures, the key determining factor is arithmetic intensity—the ratio of compute operations to memory accesses. Workloads with smaller working sets and higher arithmetic intensity benefit from SRAM-centric designs, while larger working sets requiring massive memory capacity still favor GPU architectures with abundant HBM. The industry is expected to see continued specialization, with new memory technologies emerging to bridge the gap between SRAM and HBM approaches as AI inference demands continue to evolve.

  • The industry is moving toward specialized hardware, with SRAM-centric chips capturing inference tasks requiring low latency and high throughput
  • New memory technologies are expected to emerge that bridge the performance gap between on-chip SRAM and off-chip HBM

Editorial Opinion

The $20 billion Groq licensing deal represents a watershed moment for alternative AI chip architectures, validating that the GPU monopoly on AI compute is genuinely vulnerable. What's particularly noteworthy is how this shift is being driven by inference economics rather than training—as models grow and inference costs dominate, the latency advantages of near-compute memory become financially compelling. The emergence of workload-specific optimization, where different silicon handles different tasks, suggests we're entering an era of heterogeneous AI infrastructure rather than one-size-fits-all solutions.

Large Language Models (LLMs)MLOps & InfrastructureAI HardwarePartnershipsMarket Trends

More from NVIDIA

NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Introduces Nemotron 3: Open-Source Family of Efficient AI Models with Up to 1M Token Context

2026-04-03
NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Claims World's Lowest Cost Per Token for AI Inference

2026-04-03

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
Sweden Polytechnic InstituteSweden Polytechnic Institute
RESEARCH

Research Reveals Brevity Constraints Can Improve LLM Accuracy by Up to 26.3%

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us