#gpu

LLM Reddit Mar 26, 2026 2 min read

A LocalLLaMA benchmark maps where RTX 5090, AI395, and dual R9700 actually win

A llama.cpp comparison on r/LocalLLaMA reached 55 upvotes and 81 comments. By testing RTX 5090, DGX Spark, AMD AI395, and single or dual R9700 setups under the same parameters, the post offers a practical view of local inference trade-offs that vendor slides usually hide.

#llama.cpp #benchmark #local-llm

LLM Reddit Mar 26, 2026 2 min read

Intel’s Arc Pro B70/B65 lands squarely in the local LLM conversation

A LocalLLaMA thread about Intel’s Arc Pro B70 and B65 reached 213 upvotes and 133 comments. Intel says the B70 is available from March 25, 2026 with a suggested starting price of $949, while the B65 follows in mid-April.

#intel #gpu #vram

LLM Reddit Mar 24, 2026 1 min read

LocalLLaMA highlights FlashAttention-4 gains on Blackwell and the limits for everyday GPUs

A technical LocalLLaMA thread translated the FlashAttention-4 paper into practical deployment guidance, emphasizing huge Blackwell gains, faster Python-based kernel development, and the fact that most A100 or consumer-GPU users cannot use the full benefits yet.

#flashattention #inference #gpu

AI Hacker News Mar 21, 2026 3 min read

Hacker News Reframes Flash-KMeans as Exact K-Means Moves Toward an Online GPU Primitive

Flash-KMeans is an arXiv paper submitted on 10 Mar 2026 that targets two concrete GPU bottlenecks in Exact K-Means: materializing the N x K distance matrix in HBM and atomic contention during centroid updates. The Hacker News thread reached 180 points and 14 comments because systems-minded readers immediately connected the work to FlashAttention-style dataflow optimization, practical deployment questions, and the broader shift of K-Means from offline preprocessing to an online AI primitive.

#k-means #gpu #systems

LLM Reddit Mar 6, 2026 1 min read

FlashAttention-4 targets Blackwell bottlenecks with overlap-first kernel design

A LocalLLaMA thread spotlights FlashAttention-4, which reports up to 1605 TFLOPs/s on B200 BF16 and introduces pipeline and memory-layout changes tuned for Blackwell constraints.

#flashattention #nvidia #blackwell

Gaming Reddit Mar 6, 2026 2 min read

PCWorld: Nvidia Holds Over 90% of PC Add-in GPU Market in Q4 2025

A popular r/pcgaming thread spotlights PCWorld’s report citing Jon Peddie Research data: Nvidia reportedly controls over 90% of discrete PC graphics cards, while AMD falls below 10%.

#nvidia #gpu #pc-hardware

Gaming Reddit Mar 2, 2026 1 min read

Microsoft Shader Execution Reordering Delivers 90% Performance Boost on Intel Arc B-Series, 80% on NVIDIA Blackwell

Microsoft's Shader Execution Reordering (SER) technology is delivering dramatic performance gains on modern GPUs, achieving up to 90% improvement on Intel Arc B-Series and 80% on NVIDIA Blackwell GPUs, according to TechPowerUp.

#pc #hardware #gpu

LLM Reddit Mar 2, 2026 1 min read

How to Run Qwen3.5 27B with 170k Context at 100+ t/s on 2x RTX 3090

A community developer achieved 100+ t/s decode speed and 585 t/s aggregate throughput for 8 simultaneous requests running Qwen3.5 27B on a dual RTX 3090 setup with NVLink, using vLLM with tensor parallelism and MTP optimization.

#qwen #local-inference #vllm

AI Mar 1, 2026 1 min read

NVIDIA Vera Rubin NVL72: 5x Blackwell Performance and 10x Lower Inference Cost

NVIDIA revealed detailed specs for Vera Rubin NVL72. Each Rubin GPU delivers 50 PFLOPS inference (5x Blackwell GB200), 22 TB/s HBM4 bandwidth (2.8x Blackwell), and cuts inference cost per million tokens by 10x. Ships H2 2026.

#nvidia #hardware #gpu

Gaming Reddit Feb 26, 2026 2 min read

Nvidia Revenue Split Highlights Data Center Dominance Over Gaming

A high-signal r/pcgaming thread tracks PC Gamer coverage of Nvidia earnings: $193.7B annual data center revenue (+75% YoY) versus $16B from gaming, reframing how players read product-priority decisions.

#nvidia #gpu #earnings

AI Feb 22, 2026 1 min read

Jensen Huang Promises a Chip "the World Has Never Seen" at GTC 2026 in March

NVIDIA CEO Jensen Huang announced on February 19 that GTC 2026 (March 16–19, San Jose) will feature a surprise chip reveal, fueling speculation about new hardware beyond the Rubin platform.

#nvidia #hardware #jensen-huang

LLM Hacker News Feb 22, 2026 1 min read

Running Llama 3.1 70B on a Single RTX 3090 via NVMe-to-GPU

A new open-source project called ntransformer enables running the 140GB Llama 3.1 70B model on a single consumer RTX 3090 by streaming weights directly from NVMe storage to GPU, completely bypassing CPU RAM.

#llama #gpu #open-source