A LocalLLaMA thread spotlights FlashAttention-4, which reports up to 1605 TFLOPs/s on B200 BF16 and introduces pipeline and memory-layout changes tuned for Blackwell constraints.
#gpu
A popular r/pcgaming thread spotlights PCWorld’s report citing Jon Peddie Research data: Nvidia reportedly controls over 90% of discrete PC graphics cards, while AMD falls below 10%.
Microsoft's Shader Execution Reordering (SER) technology is delivering dramatic performance gains on modern GPUs, achieving up to 90% improvement on Intel Arc B-Series and 80% on NVIDIA Blackwell GPUs, according to TechPowerUp.
A community developer achieved 100+ t/s decode speed and 585 t/s aggregate throughput for 8 simultaneous requests running Qwen3.5 27B on a dual RTX 3090 setup with NVLink, using vLLM with tensor parallelism and MTP optimization.
NVIDIA revealed detailed specs for Vera Rubin NVL72. Each Rubin GPU delivers 50 PFLOPS inference (5x Blackwell GB200), 22 TB/s HBM4 bandwidth (2.8x Blackwell), and cuts inference cost per million tokens by 10x. Ships H2 2026.
A high-signal r/pcgaming thread tracks PC Gamer coverage of Nvidia earnings: $193.7B annual data center revenue (+75% YoY) versus $16B from gaming, reframing how players read product-priority decisions.
NVIDIA CEO Jensen Huang announced on February 19 that GTC 2026 (March 16–19, San Jose) will feature a surprise chip reveal, fueling speculation about new hardware beyond the Rubin platform.
A new open-source project called ntransformer enables running the 140GB Llama 3.1 70B model on a single consumer RTX 3090 by streaming weights directly from NVMe storage to GPU, completely bypassing CPU RAM.
NVIDIA unveiled its next-gen AI platform Rubin, delivering 10x reduction in inference token cost and 4x fewer GPUs for MoE model training vs. Blackwell. Launch planned for H2 2026.
NVIDIA unveiled its next-generation AI platform Vera Rubin at CES 2026, reducing GPUs needed for MoE model training by 4x and slashing inference token costs by 10x, with availability in H2 2026.
NVIDIA announced the Rubin platform at CES 2026 in January. Comprising six new chips, the Vera Rubin superchip delivers 5x improved inference performance over GB200. Major AI companies including OpenAI, Meta, and Microsoft plan to adopt it.