VDCores: Open-Source GPU Programming Model Boosts LLM Inference Throughput by Up to 77%

Key Takeaways

▸VDCores introduces a decoupled programming model abstracting asynchronous GPU hardware units as resource-isolated virtual cores
▸Demonstrates 24% average and up to 77% peak throughput improvements on LLM inference workloads across NVIDIA's latest GPUs
▸Reduces kernel programming effort by 90%, lowering barriers to GPU optimization for developers

Source:

Hacker Newshttps://arxiv.org/abs/2605.03190↗

Summary

Researchers have introduced VDCores (Virtual Decoupled Engines), a novel programming and execution model designed to better utilize the specialized asynchronous hardware units in modern GPUs. The innovation addresses a critical inefficiency in current GPU software stacks, which organize workloads around monolithic kernel models that fail to exploit the potential of asynchronous execution units. VDCores abstracts these units as resource-isolated virtual cores and represents workloads as dependency-connected micro-operations, enabling automatic overlap of memory and compute operations.

Testing across four LLM inference workloads on NVIDIA's GH200, H100, and RTX 6000 Pro GPUs demonstrated significant practical impact: VDCores achieved a 24% average improvement in decoding throughput, with peaks reaching 77% under dynamic input conditions. Equally compelling, the model reduces kernel programming and specialization effort by 90%, democratizing GPU optimization beyond expert developers. The research has been open-sourced, providing the broader community immediate access to these optimizations.

The work addresses an increasingly important problem as LLM inference workloads scale across data centers worldwide. By decoupling resource management from application logic, VDCores eliminates static orchestration bottlenecks and improves hardware utilization through dynamic, dependency-driven scheduling. This approach is particularly relevant as GPU hardware continues to incorporate more specialized execution units that remain underutilized by traditional software patterns.

Open-sourced implementation enables industry-wide adoption and validation
Addresses fundamental mismatch between modern GPU hardware capabilities and traditional software organization patterns

Editorial Opinion

VDCores represents a meaningful advance in bridging the gap between GPU hardware potential and software reality. The combination of substantial performance gains (24-77%) on critical LLM inference workloads with a 90% reduction in programming complexity suggests real-world impact potential across data centers. The open-source release is particularly valuable—it allows the broader research and engineering communities to validate the approach, identify further optimizations, and potentially influence how future GPU software stacks are designed. This work exemplifies how rethinking fundamental abstractions can unlock significant efficiency gains in increasingly important computational workloads.

VDCores: Open-Source GPU Programming Model Boosts LLM Inference Throughput by Up to 77%

Key Takeaways

▸VDCores introduces a decoupled programming model abstracting asynchronous GPU hardware units as resource-isolated virtual cores
▸Demonstrates 24% average and up to 77% peak throughput improvements on LLM inference workloads across NVIDIA's latest GPUs
▸Reduces kernel programming effort by 90%, lowering barriers to GPU optimization for developers

Summary

Open-sourced implementation enables industry-wide adoption and validation
Addresses fundamental mismatch between modern GPU hardware capabilities and traditional software organization patterns

Editorial Opinion

VDCores represents a meaningful advance in bridging the gap between GPU hardware potential and software reality. The combination of substantial performance gains (24-77%) on critical LLM inference workloads with a 90% reduction in programming complexity suggests real-world impact potential across data centers. The open-source release is particularly valuable—it allows the broader research and engineering communities to validate the approach, identify further optimizations, and potentially influence how future GPU software stacks are designed. This work exemplifies how rethinking fundamental abstractions can unlock significant efficiency gains in increasingly important computational workloads.

VDCores: Open-Source GPU Programming Model Boosts LLM Inference Throughput by Up to 77%

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Releases Nemotron-Cascade 2: 30B Open Model Achieves IMO Gold Medal with Remarkable Parameter Efficiency

NVIDIA Introduces Dynamic Persistent Tile Scheduling with Cluster Launch Control on Blackwell

NVIDIA and Intel Partner on Custom AI Chips, NVIDIA Invests $5 Billion

Comments

Suggested

Anthropic Releases Prempti: Open-Source Guardrails for AI Coding Agents

mm-ctx: Open-Source Multimodal CLI Toolkit Brings Vision Capabilities to AI Agents

SpaceX Backs Anthropic with Massive Data Centre Deal Amidst Musk's OpenAI Legal Battle

VDCores: Open-Source GPU Programming Model Boosts LLM Inference Throughput by Up to 77%

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Releases Nemotron-Cascade 2: 30B Open Model Achieves IMO Gold Medal with Remarkable Parameter Efficiency

NVIDIA Introduces Dynamic Persistent Tile Scheduling with Cluster Launch Control on Blackwell

NVIDIA and Intel Partner on Custom AI Chips, NVIDIA Invests $5 Billion

Comments

Suggested

Anthropic Releases Prempti: Open-Source Guardrails for AI Coding Agents

mm-ctx: Open-Source Multimodal CLI Toolkit Brings Vision Capabilities to AI Agents

SpaceX Backs Anthropic with Massive Data Centre Deal Amidst Musk's OpenAI Legal Battle