A DGX Spark owner on LocalLLaMA argues that NVFP4 remains far from production-ready, prompting a broader debate about whether NVIDIA's premium local AI box still justifies its price.
#ai-hardware
RSS FeedA Hacker News thread pushed fresh attention to tinygrad's tinybox hardware line. The product page now spells out specs, pricing, and shipping status for the red v2 and green v2 Blackwell systems aimed at deep-learning workloads.
r/LocalLLaMA highlighted Tenstorrent's desk-side TT-QuietBox 2, a liquid-cooled RISC-V inference workstation aimed at 120B-scale local AI workloads. The launch combines open tooling, a standard 120V power target, and ambitious performance claims that Reddit immediately debated.
A researcher published a reverse engineering analysis of the Apple M4 chip's Neural Engine, revealing its CoreML-based architecture, 6.6 FLOPS/W energy efficiency, and the ability to completely shut down when idle.
Huawei spinoff Honor has unveiled its first humanoid robot, demonstrated with a 'feet slide' moonwalk dance performance. The launch signals Honor's entry into the growing Chinese humanoid robotics race.
Startup Taalas is taking a radical approach to AI inference: etching LLM model weights and architecture directly into a silicon chip. Their Llama 3.1 8B demo achieves 16,000 tokens per second — but the approach bets that model architectures won't change.
A high-engagement Hacker News thread spotlights Taalas’ claim that model-specific silicon can cut inference latency and cost, including a hard-wired Llama 3.1 8B deployment reportedly reaching 17K tokens/sec per user.
Microsoft announced Maia 200 (codenamed Braga) on 2026-01-26 as its second-generation in-house AI accelerator. The company says selected Copilot and Azure AI workloads show up to 1.7x performance versus Maia 100.
NVIDIA unveiled its next-gen AI platform Rubin, delivering 10x reduction in inference token cost and 4x fewer GPUs for MoE model training vs. Blackwell. Launch planned for H2 2026.
NVIDIA unveiled its next-generation AI platform Vera Rubin at CES 2026, reducing GPUs needed for MoE model training by 4x and slashing inference token costs by 10x, with availability in H2 2026.