GreenBoost: Open-Source Driver Extends NVIDIA GPU Memory by Leveraging System RAM and NVMe Storage

Key Takeaways

▸GreenBoost uses kernel-level DMA-BUF mechanisms to make system RAM and NVMe storage appear as CUDA device memory to GPU applications
▸The solution achieves ~32 GB/s throughput via PCIe 4.0 x16, enabling practical use of extended memory for LLM inference despite bandwidth overhead
▸Advanced symbol interception techniques allow GreenBoost to work with frameworks like Ollama that use internal GPU library resolution, bypassing standard LD_PRELOAD mechanisms

Sources:

Hacker Newshttps://www.phoronix.com/news/Open-Source-GreenBoost-NVIDIA↗

Hacker Newshttps://gitlab.com/IsolatedOctopi/nvidia_greenboost↗

Summary

A new open-source driver called GreenBoost has been developed to augment NVIDIA GPU VRAM by dynamically allocating system RAM and NVMe storage, enabling the execution of larger language models on GPUs with limited on-device memory. The solution consists of two main components: a kernel module that allocates pinned DDR4 pages and exports them as DMA-BUF file descriptors for GPU access via PCIe 4.0 (~32 GB/s throughput), and a CUDA shim library that intercepts memory allocation calls to transparently redirect large allocations to the extended memory pool.

The kernel module uses the buddy allocator to efficiently manage 2 MB compound pages and includes a watchdog thread to monitor system RAM and NVMe pressure before resource exhaustion. The CUDA shim intercepts cudaMalloc, cudaFree, and related functions, passing small allocations directly to the GPU while redirecting larger ones (KV cache, model weights) through the kernel module. The implementation includes sophisticated symbol interception techniques to work with frameworks like Ollama that bypass standard library preloading mechanisms.

This approach allows users to overcome GPU memory bottlenecks without purchasing additional hardware, making it possible to run larger models on mid-range GPUs by trading compute speed for extended memory capacity through PCIe bandwidth.

The driver intelligently manages memory allocation by redirecting only large allocations (>256 MB) to extended storage while keeping small allocations on the GPU

Editorial Opinion

GreenBoost represents a pragmatic solution to the GPU memory constraints that have limited LLM deployment on consumer and mid-range hardware. By creatively leveraging existing PCIe infrastructure and system memory, the project democratizes access to larger models without requiring expensive hardware upgrades—though users should be aware that the bandwidth-to-latency tradeoff means inference will be slower than on native VRAM. If the implementation proves stable and widely compatible, this could significantly impact the accessibility of open-source LLM inference across the ecosystem.

GreenBoost: Open-Source Driver Extends NVIDIA GPU Memory by Leveraging System RAM and NVMe Storage

Key Takeaways

▸GreenBoost uses kernel-level DMA-BUF mechanisms to make system RAM and NVMe storage appear as CUDA device memory to GPU applications
▸The solution achieves ~32 GB/s throughput via PCIe 4.0 x16, enabling practical use of extended memory for LLM inference despite bandwidth overhead
▸Advanced symbol interception techniques allow GreenBoost to work with frameworks like Ollama that use internal GPU library resolution, bypassing standard LD_PRELOAD mechanisms

Summary

The driver intelligently manages memory allocation by redirecting only large allocations (>256 MB) to extended storage while keeping small allocations on the GPU

Editorial Opinion

GreenBoost represents a pragmatic solution to the GPU memory constraints that have limited LLM deployment on consumer and mid-range hardware. By creatively leveraging existing PCIe infrastructure and system memory, the project democratizes access to larger models without requiring expensive hardware upgrades—though users should be aware that the bandwidth-to-latency tradeoff means inference will be slower than on native VRAM. If the implementation proves stable and widely compatible, this could significantly impact the accessibility of open-source LLM inference across the ecosystem.

GreenBoost: Open-Source Driver Extends NVIDIA GPU Memory by Leveraging System RAM and NVMe Storage

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Launches Cloud Functions Platform for GPU-Accelerated Workload Deployment at Scale

NVIDIA Launches Blackwell GPU Optimization Series: First Comprehensive Guide to Matrix Multiplication Kernels

Singapore Seizes $42M Mansion in NVIDIA Chip Smuggling Crackdown

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

GreenBoost: Open-Source Driver Extends NVIDIA GPU Memory by Leveraging System RAM and NVMe Storage

Key Takeaways

Summary

Editorial Opinion

More from NVIDIA

NVIDIA Launches Cloud Functions Platform for GPU-Accelerated Workload Deployment at Scale

NVIDIA Launches Blackwell GPU Optimization Series: First Comprehensive Guide to Matrix Multiplication Kernels

Singapore Seizes $42M Mansion in NVIDIA Chip Smuggling Crackdown

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains