VDCores: Open-Source GPU Programming Model Boosts LLM Inference Throughput by Up to 77%
Key Takeaways
- ▸VDCores introduces a decoupled programming model abstracting asynchronous GPU hardware units as resource-isolated virtual cores
- ▸Demonstrates 24% average and up to 77% peak throughput improvements on LLM inference workloads across NVIDIA's latest GPUs
- ▸Reduces kernel programming effort by 90%, lowering barriers to GPU optimization for developers
Summary
Researchers have introduced VDCores (Virtual Decoupled Engines), a novel programming and execution model designed to better utilize the specialized asynchronous hardware units in modern GPUs. The innovation addresses a critical inefficiency in current GPU software stacks, which organize workloads around monolithic kernel models that fail to exploit the potential of asynchronous execution units. VDCores abstracts these units as resource-isolated virtual cores and represents workloads as dependency-connected micro-operations, enabling automatic overlap of memory and compute operations.
Testing across four LLM inference workloads on NVIDIA's GH200, H100, and RTX 6000 Pro GPUs demonstrated significant practical impact: VDCores achieved a 24% average improvement in decoding throughput, with peaks reaching 77% under dynamic input conditions. Equally compelling, the model reduces kernel programming and specialization effort by 90%, democratizing GPU optimization beyond expert developers. The research has been open-sourced, providing the broader community immediate access to these optimizations.
The work addresses an increasingly important problem as LLM inference workloads scale across data centers worldwide. By decoupling resource management from application logic, VDCores eliminates static orchestration bottlenecks and improves hardware utilization through dynamic, dependency-driven scheduling. This approach is particularly relevant as GPU hardware continues to incorporate more specialized execution units that remain underutilized by traditional software patterns.
- Open-sourced implementation enables industry-wide adoption and validation
- Addresses fundamental mismatch between modern GPU hardware capabilities and traditional software organization patterns
Editorial Opinion
VDCores represents a meaningful advance in bridging the gap between GPU hardware potential and software reality. The combination of substantial performance gains (24-77%) on critical LLM inference workloads with a 90% reduction in programming complexity suggests real-world impact potential across data centers. The open-source release is particularly valuable—it allows the broader research and engineering communities to validate the approach, identify further optimizations, and potentially influence how future GPU software stacks are designed. This work exemplifies how rethinking fundamental abstractions can unlock significant efficiency gains in increasingly important computational workloads.


