BotBeat
...
← Back

> ▌

NVIDIANVIDIA
PRODUCT LAUNCHNVIDIA2026-02-23

NVIDIA Blackwell Ultra GB300 Racks Deliver Up to 1.87X Higher Throughput in Long-Context AI Inference

Key Takeaways

  • ▸NVIDIA's Blackwell Ultra GB300 racks deliver up to 1.5X lower latency and 1.87X higher user throughput versus GB200 in long-context inference
  • ▸Performance gains are driven by advanced parallelism, precision optimization, and intelligent resource management
  • ▸Benchmarks were conducted by independent organization LMSys, known for evaluating LLM performance
Source:
X (Twitter)https://twitter.com/wccftech/status/2025299233524600992↗
Loading tweet...

Summary

NVIDIA has announced new benchmark results from LMSys showing that its Blackwell Ultra GB300 racks significantly outperform the previous GB200 generation in long-context open-source AI inference workloads. According to the benchmarks, the GB300 achieves up to 1.5X lower latency and 1.87X higher user throughput compared to the GB200 architecture. These performance improvements are attributed to advanced parallelism techniques, precision optimization, and intelligent resource management.

The benchmarks were conducted by LMSys, an independent organization known for maintaining the Chatbot Arena leaderboard and evaluating large language model performance. Long-context inference has become increasingly critical as AI models handle larger input sequences for applications like document analysis, extended conversations, and complex reasoning tasks. The ability to process these workloads efficiently directly impacts user experience and operational costs for AI deployments.

The Blackwell Ultra GB300 represents NVIDIA's latest advancement in AI infrastructure, building on the Blackwell architecture announced earlier. The nearly 2X improvement in user throughput suggests that organizations can serve substantially more concurrent users with the same hardware footprint, or alternatively achieve the same performance with fewer racks. This efficiency gain is particularly important for enterprises and cloud providers deploying open-source models at scale, where infrastructure costs represent a significant portion of total operating expenses.

  • Improved throughput allows organizations to serve more concurrent users with the same hardware or reduce infrastructure requirements
  • The advancements specifically target long-context open-source model inference, an increasingly critical workload

Editorial Opinion

These benchmark results represent a significant leap in AI inference efficiency, particularly for the long-context workloads that are becoming standard in production deployments. Nearly doubling user throughput while reducing latency addresses two of the most critical pain points for organizations deploying open-source models at scale. If these gains hold across diverse workloads, the GB300 could substantially lower the total cost of ownership for AI infrastructure while improving user experience—a rare combination that should accelerate enterprise AI adoption.

Large Language Models (LLMs)MLOps & InfrastructureAI HardwareMarket TrendsOpen Source

More from NVIDIA

NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Introduces Nemotron 3: Open-Source Family of Efficient AI Models with Up to 1M Token Context

2026-04-03
NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Claims World's Lowest Cost Per Token for AI Inference

2026-04-03

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us