BotBeat
...
← Back

> ▌

NVIDIANVIDIA
RESEARCHNVIDIA2026-05-10

VDCores: Open-Source GPU Programming Model Boosts LLM Inference Throughput by Up to 77%

Key Takeaways

  • ▸VDCores introduces a decoupled programming model abstracting asynchronous GPU hardware units as resource-isolated virtual cores
  • ▸Demonstrates 24% average and up to 77% peak throughput improvements on LLM inference workloads across NVIDIA's latest GPUs
  • ▸Reduces kernel programming effort by 90%, lowering barriers to GPU optimization for developers
Source:
Hacker Newshttps://arxiv.org/abs/2605.03190↗

Summary

Researchers have introduced VDCores (Virtual Decoupled Engines), a novel programming and execution model designed to better utilize the specialized asynchronous hardware units in modern GPUs. The innovation addresses a critical inefficiency in current GPU software stacks, which organize workloads around monolithic kernel models that fail to exploit the potential of asynchronous execution units. VDCores abstracts these units as resource-isolated virtual cores and represents workloads as dependency-connected micro-operations, enabling automatic overlap of memory and compute operations.

Testing across four LLM inference workloads on NVIDIA's GH200, H100, and RTX 6000 Pro GPUs demonstrated significant practical impact: VDCores achieved a 24% average improvement in decoding throughput, with peaks reaching 77% under dynamic input conditions. Equally compelling, the model reduces kernel programming and specialization effort by 90%, democratizing GPU optimization beyond expert developers. The research has been open-sourced, providing the broader community immediate access to these optimizations.

The work addresses an increasingly important problem as LLM inference workloads scale across data centers worldwide. By decoupling resource management from application logic, VDCores eliminates static orchestration bottlenecks and improves hardware utilization through dynamic, dependency-driven scheduling. This approach is particularly relevant as GPU hardware continues to incorporate more specialized execution units that remain underutilized by traditional software patterns.

  • Open-sourced implementation enables industry-wide adoption and validation
  • Addresses fundamental mismatch between modern GPU hardware capabilities and traditional software organization patterns

Editorial Opinion

VDCores represents a meaningful advance in bridging the gap between GPU hardware potential and software reality. The combination of substantial performance gains (24-77%) on critical LLM inference workloads with a 90% reduction in programming complexity suggests real-world impact potential across data centers. The open-source release is particularly valuable—it allows the broader research and engineering communities to validate the approach, identify further optimizations, and potentially influence how future GPU software stacks are designed. This work exemplifies how rethinking fundamental abstractions can unlock significant efficiency gains in increasingly important computational workloads.

Machine LearningDeep LearningAI HardwareScience & ResearchOpen Source

More from NVIDIA

NVIDIANVIDIA
INDUSTRY REPORT

Analysis: AI GPUs Likely Last Longer Than Three-Year Industry Claim Suggests

2026-06-19
NVIDIANVIDIA
RESEARCH

cuTile Rust: Safe GPU Kernel Programming Brings Memory Safety to NVIDIA Acceleration

2026-06-17
NVIDIANVIDIA
UPDATE

NVIDIA GB300 NVL72 Achieves 1.6x Performance Boost on DeepSeek V3 Pretraining

2026-06-16

Comments

Suggested

Moebius Research ProjectMoebius Research Project
RESEARCH

Moebius: Lightweight Image Inpainting Framework Achieves 10B-Level Quality with Just 0.2B Parameters

2026-06-20
UC Davis HealthUC Davis Health
RESEARCH

Brain-Computer Interface Enables Independent At-Home Communication for Man with ALS

2026-06-20
AnthropicAnthropic
FUNDING & BUSINESS

Nobel Prize-Winning AlphaFold Pioneer Departs Google DeepMind for Anthropic

2026-06-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us