#multi-gpu

LLM Reddit Apr 28, 2026 3 min read

LocalLLaMA’s Budget VRAM Trick: Add an Old GPU to Keep 27B Models Off the CPU

LocalLLaMA latched onto a very concrete claim: if a 27B model fits entirely in VRAM across two mismatched cards, even a weak second GPU can be better than spilling into system RAM for long-context decoding.

#local-llms #vram #multi-gpu

LLM Reddit Apr 10, 2026 2 min read

Reddit Welcomes llama.cpp Tensor Parallelism, With an Experimental Warning Label

A high-scoring LocalLLaMA thread treated merged PR #19378 as a meaningful step toward more practical multi-GPU inference in llama.cpp. The catch is that the new <code>--split-mode tensor</code> path is still explicitly experimental, strongest today on CUDA, and still rough on ROCm and Vulkan.

#llama-cpp #tensor-parallelism #multi-gpu