LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe

GreenBoost is exactly the kind of infrastructure idea that LocalLLaMA notices quickly because it targets one of the ecosystem’s hardest practical limits: not enough GPU memory. At crawl time the Reddit thread had 141 upvotes and 38 comments. The linked Phoronix report was published on March 14, 2026, and described GreenBoost as an independently developed open-source Linux kernel module meant to augment NVIDIA GPU memory with system RAM and NVMe storage for larger LLM workloads.

According to Phoronix, GreenBoost does not replace NVIDIA’s official Linux driver stack. Instead it works alongside it through a dedicated kernel module, greenboost.ko, plus a CUDA user-space shim. The kernel side allocates pinned DDR4 pages with the buddy allocator, exports them as DMA-BUF file descriptors, and lets the GPU import them as CUDA external memory. The article says PCIe 4.0 x16 handles the actual data movement, while a sysfs interface and watchdog thread monitor RAM and NVMe pressure.

Why the design caught the community’s attention

The CUDA shim can let small allocations pass through normally while redirecting larger ones, such as overflowing model weights or KV cache, into the expanded memory path.
The user-space layer hooks allocation calls and even symbol lookups so applications like Ollama can see a larger usable pool without direct application changes.
The developer’s motivating example was trying to run a 31.8 GB model on a GeForce RTX 5070 with 12 GB of dedicated vRAM.

That makes the project interesting for local LLM operators because the usual fallback options are all painful. Offloading layers to system memory can crush throughput, while heavier quantization can reduce quality. GreenBoost proposes a different tradeoff by treating the storage hierarchy more aggressively as part of the usable GPU memory surface. Whether that pays off in practice will depend on bandwidth, latency, and workload shape, and the code is clearly experimental. But the enthusiasm on LocalLLaMA is easy to understand. The memory ceiling on consumer GPUs is still one of the biggest reasons people cannot run the models they want at the precision they want.

Source: Phoronix · Code: GitLab · Community discussion: r/LocalLLaMA

LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe

Why the design caught the community’s attention

Related Articles

Tiny-vLLM teaches LLM inference by rebuilding the stack in C++ and CUDA

Nemotron 3 Ultra turns agent cost and runtime into NVIDIA’s pitch

Claude Fable 5 puts Mythos-class AI behind cautious fallbacks

Related Articles

Tiny-vLLM teaches LLM inference by rebuilding the stack in C++ and CUDA
LLM Hacker News May 31, 2026 1 min read

Nemotron 3 Ultra turns agent cost and runtime into NVIDIA’s pitch
LLM Jun 1, 2026 2 min read

Claude Fable 5 puts Mythos-class AI behind cautious fallbacks