A Linux VRAM optimization from Valve engineer Natalie Vlock prioritizes foreground games when memory is tight. TweakTown cites testing on an RX 6500 XT where Alan Wake II rose from 14 FPS to 41 FPS at 1080p low with FSR Quality.
#performance
RSS FeedLocalLLaMA upvoted the merge because it is immediately testable, but the useful caveat was clear: speedups depend heavily on prompt repetition and draft acceptance.
A r/MachineLearning post and linked benchmark writeup argue that batched FP32 SGEMM on RTX 5090 is hitting an inefficient cuBLAS path, leaving much of the GPU idle.
TechSpot reported on April 4, 2026 that newly uncovered Steam client code points to a feature that could estimate game frame rates from other users' real-world data. Paired with Valve's March 9 rollout of optional anonymized framerate collection, the move could make Steam's store pages much more useful for performance-conscious buyers.
Tom’s Hardware says RPCS3 developers found new SPU usage patterns and added more efficient recompilation paths for the PlayStation 3’s Cell processor. The project says the change benefits every game, with Twisted Metal showing a 5% to 7% average FPS uplift between recent builds.
TechSpot reports that newly uncovered Steam client code points to an estimated-FPS chart built from other users’ frame-rate data. If Valve ships it, the feature could turn vague PC system requirements into a more concrete buying signal, especially for players comparing similar CPU and GPU combinations.
A March 25, 2026 Hacker News post about Reco's `gnata` rewrite reached 256 points and 237 comments at crawl time. Reco says AI-assisted porting of JSONata 2.x to Go took about 7 hours and $400 in tokens, then removed an RPC-heavy Node fleet and eventually cut roughly $500,000 per year in infrastructure cost.
Gearbox and 2K say Borderlands 4's March 26 update should lift average PC frame rates by roughly 20% while cutting crashes and stutter since launch.
A LocalLLaMA thread reported a large prompt-processing speedup on Qwen3.5-27B by lowering llama.cpp `--ubatch-size` to 64 on an RX 9070 XT. The interesting part is not a universal magic number, but the reminder that prompt ingestion and token generation can respond very differently to `n_ubatch` tuning.
A community developer achieved 100+ t/s decode speed and 585 t/s aggregate throughput for 8 simultaneous requests running Qwen3.5 27B on a dual RTX 3090 setup with NVLink, using vLLM with tensor parallelism and MTP optimization.
Ollama 0.17, released February 22, introduces a new native inference engine replacing llama.cpp server mode, delivering up to 40% faster prompt processing and 18% faster token generation on NVIDIA GPUs, plus improved multi-GPU tensor parallelism and AMD RDNA 4 support.
A researcher dramatically improved 15 LLMs' coding performance with a single change. By redesigning the edit tool rather than the model, Grok Code Fast's success rate jumped 10x from 6.7% to 68.3%.