LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe

Original: Open-Source "GreenBoost" Driver Aims To Augment NVIDIA GPUs vRAM With System RAM & NVMe To Handle Larger LLMs View original →

Read in other languages: 한국어日本語
LLM Mar 16, 2026 By Insights AI (Reddit) 2 min read 1 views Source

GreenBoost is exactly the kind of infrastructure idea that LocalLLaMA notices quickly because it targets one of the ecosystem’s hardest practical limits: not enough GPU memory. At crawl time the Reddit thread had 141 upvotes and 38 comments. The linked Phoronix report was published on March 14, 2026, and described GreenBoost as an independently developed open-source Linux kernel module meant to augment NVIDIA GPU memory with system RAM and NVMe storage for larger LLM workloads.

According to Phoronix, GreenBoost does not replace NVIDIA’s official Linux driver stack. Instead it works alongside it through a dedicated kernel module, greenboost.ko, plus a CUDA user-space shim. The kernel side allocates pinned DDR4 pages with the buddy allocator, exports them as DMA-BUF file descriptors, and lets the GPU import them as CUDA external memory. The article says PCIe 4.0 x16 handles the actual data movement, while a sysfs interface and watchdog thread monitor RAM and NVMe pressure.

Why the design caught the community’s attention

  • The CUDA shim can let small allocations pass through normally while redirecting larger ones, such as overflowing model weights or KV cache, into the expanded memory path.
  • The user-space layer hooks allocation calls and even symbol lookups so applications like Ollama can see a larger usable pool without direct application changes.
  • The developer’s motivating example was trying to run a 31.8 GB model on a GeForce RTX 5070 with 12 GB of dedicated vRAM.

That makes the project interesting for local LLM operators because the usual fallback options are all painful. Offloading layers to system memory can crush throughput, while heavier quantization can reduce quality. GreenBoost proposes a different tradeoff by treating the storage hierarchy more aggressively as part of the usable GPU memory surface. Whether that pays off in practice will depend on bandwidth, latency, and workload shape, and the code is clearly experimental. But the enthusiasm on LocalLLaMA is easy to understand. The memory ceiling on consumer GPUs is still one of the biggest reasons people cannot run the models they want at the precision they want.

Source: Phoronix · Code: GitLab · Community discussion: r/LocalLLaMA

Share: Long

Related Articles

LLM Reddit 15h ago 2 min read

A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.

LLM sources.twitter 4d ago 2 min read

NVIDIA AI Developer introduced Nemotron 3 Super on March 11, 2026 as an open 120B-parameter hybrid MoE model with 12B active parameters and a native 1M-token context window. NVIDIA says the model targets agentic workloads with up to 5x higher throughput than the previous Nemotron Super model.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.