#gpu

AI 5d ago 1 min read

Google rents 110,000 GPUs from SpaceX as Gemini demand strains capacity

Google will pay SpaceX $920M per month from October 2026 through June 2029 for access to about 110,000 NVIDIA GPUs and related compute. The deal shows how fast AI demand can pressure even one of the world’s largest infrastructure operators.

#google #spacex #ai-compute

LLM Reddit May 28, 2026 1 min read

GLM-5.1 inference gains came from network topology, not new GPUs

LocalLLaMA readers noticed the infrastructure lesson: Zai claimed 15% more GPU inference throughput and 40.6% lower first-token P99 latency with the same GPUs, model, and software stack.

#inference #networking #gpu

Sciences X/Twitter May 25, 2026 1 min read

ZOZO open-sources GPU contact solver for 180M-point simulation

ZOZO’s ppf-contact-solver brings a production-grade cloth and soft-body contact engine into the open. The headline number is more than 180 million contacts in one scene, plus Blender support and an Apache 2.0 license.

#simulation #opensource #zozo

LLM Reddit May 22, 2026 1 min read

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

A community user achieved 110 tokens/second running Qwen3.6 35B A3B on an RTX 4070 Super 12GB via ik_llama.cpp, a fork with superior CPU offload optimization that significantly outperforms upstream llama.cpp's Multi-Token Prediction implementation.

#llama-cpp #qwen #local-llm

Gaming Reddit May 18, 2026 1 min read

60% of PC Gamers Have No Plans to Build a New PC as AI Drives Component Prices Sky-High

A Tom's Hardware survey reveals 60% of PC gamers won't build a new system in the next two years, as AI infrastructure demand has caused RAM prices to triple and GPU costs to surge significantly.

#pc-gaming #hardware #ram

Gaming Reddit May 14, 2026 1 min read

AMD FSR 4.1 Coming to RX 7000 GPUs in July, RX 6000 in 2027

AMD has officially confirmed FSR Upscaling 4.1 will arrive on Radeon RX 7000 GPUs in July 2026, with RX 6000 series support following in 2027. The news extends AI-enhanced upscaling to a broader range of AMD hardware.

#amd #fsr #upscaling

Gaming Reddit Apr 29, 2026 1 min read

GALAX hands operations and RMA support to Palit as its old structure closes

A co-statement dated April 29 says Palit now handles GALAX operations and customer support. Existing owners are being directed to Palit’s RMA channels while the previous GALAX structure has been shut down.

#galax #palit #gpu

AI Hacker News Apr 20, 2026 2 min read

Zero-copy Wasm-to-GPU inference made HN ask where the speedup really is

HN found this interesting because it tests a real boundary: whether Apple Silicon unified memory can make a Wasm sandbox and a GPU buffer operate on the same bytes.

#wasm #gpu #inference

AI X/Twitter Apr 18, 2026 2 min read

Cloudflare Unweight cuts Llama bundles 22% with lossless GPU kernels

Why it matters: Cloudflare is attacking the memory-bandwidth bottleneck in LLM serving rather than only buying more GPUs. Its post reports 15-22% model-size reduction, about 3 GB VRAM saved on Llama 3.1 8B, and open-sourced GPU kernels.

#cloudflare #llm-inference #gpu

AI Apr 14, 2026 2 min read

Hugging Face turns Hub kernels into drop-in binaries with 2.5x gains

Hugging Face is trying to turn optimized GPU code into a Hub-native artifact, removing one of the messier deployment steps for PyTorch users. Clement Delangue says the new Kernels flow ships precompiled binaries matched to a specific GPU, PyTorch build, and OS, with claimed 1.7x to 2.5x speedups over PyTorch baselines.

#hugging-face #kernels #pytorch

AI Hacker News Apr 13, 2026 2 min read

Hacker News spotlights AMD's step-by-step ROCm strategy against CUDA's moat

A front-page Hacker News discussion resurfaced an EE Times interview outlining how AMD wants ROCm, Triton, OneROCm, and an open-source release model to chip away at CUDA dependence. The real test is not a headline compatibility claim, but whether stacks like vLLM and SGLang work in a boring, dependable way.

#rocm #cuda #amd

AI Reddit Apr 11, 2026 2 min read

Reddit Flags a Possible cuBLAS Regression on RTX 5090 Batched FP32 Workloads

A MachineLearning thread argues that cuBLAS may be choosing an inefficient kernel for batched FP32 matrix multiplication on RTX 5090. The significance is not just the claimed slowdown, but the fact that the post includes reproducible benchmark tables, profiling notes, and linked repro material.

#cublas #rtx-5090 #cuda