GreenBoost: Open-Source Driver Extends NVIDIA GPU Memory by Leveraging System RAM and NVMe Storage
Key Takeaways
- ▸GreenBoost uses kernel-level DMA-BUF mechanisms to make system RAM and NVMe storage appear as CUDA device memory to GPU applications
- ▸The solution achieves ~32 GB/s throughput via PCIe 4.0 x16, enabling practical use of extended memory for LLM inference despite bandwidth overhead
- ▸Advanced symbol interception techniques allow GreenBoost to work with frameworks like Ollama that use internal GPU library resolution, bypassing standard LD_PRELOAD mechanisms
Summary
A new open-source driver called GreenBoost has been developed to augment NVIDIA GPU VRAM by dynamically allocating system RAM and NVMe storage, enabling the execution of larger language models on GPUs with limited on-device memory. The solution consists of two main components: a kernel module that allocates pinned DDR4 pages and exports them as DMA-BUF file descriptors for GPU access via PCIe 4.0 (~32 GB/s throughput), and a CUDA shim library that intercepts memory allocation calls to transparently redirect large allocations to the extended memory pool.
The kernel module uses the buddy allocator to efficiently manage 2 MB compound pages and includes a watchdog thread to monitor system RAM and NVMe pressure before resource exhaustion. The CUDA shim intercepts cudaMalloc, cudaFree, and related functions, passing small allocations directly to the GPU while redirecting larger ones (KV cache, model weights) through the kernel module. The implementation includes sophisticated symbol interception techniques to work with frameworks like Ollama that bypass standard library preloading mechanisms.
This approach allows users to overcome GPU memory bottlenecks without purchasing additional hardware, making it possible to run larger models on mid-range GPUs by trading compute speed for extended memory capacity through PCIe bandwidth.
- The driver intelligently manages memory allocation by redirecting only large allocations (>256 MB) to extended storage while keeping small allocations on the GPU
Editorial Opinion
GreenBoost represents a pragmatic solution to the GPU memory constraints that have limited LLM deployment on consumer and mid-range hardware. By creatively leveraging existing PCIe infrastructure and system memory, the project democratizes access to larger models without requiring expensive hardware upgrades—though users should be aware that the bandwidth-to-latency tradeoff means inference will be slower than on native VRAM. If the implementation proves stable and widely compatible, this could significantly impact the accessibility of open-source LLM inference across the ecosystem.


