r/LocalLLaMA flags Tenstorrent QuietBox 2 as a desk-side RISC-V box for local AI inference

r/LocalLLaMA surfaced a piece on March 13, 2026 about Tenstorrent's TT-QuietBox 2, and the reaction was exactly what you would expect from a local-inference community: equal parts curiosity, benchmark skepticism, and price math. At crawl time on March 14, 2026, the Reddit post had 79 upvotes and 38 comments. The reason it drew attention is straightforward. QuietBox 2 is not another cloud appliance or datacenter rack announcement. It is framed as a desk-side system for running large AI workloads locally, with Tenstorrent leaning hard on open tooling and RISC-V branding.

According to StorageReview, the liquid-cooled workstation is designed to run models up to 120 billion parameters entirely on premises. Tenstorrent positions the machine as a private inference box for labs, offices, and small to medium businesses that want more control over the full stack. The article says it ships with Ubuntu 24.04, plugs into a standard 120V outlet, and avoids the normal rack, cooling, and dedicated-power assumptions that often push serious AI infrastructure out of reach for smaller teams. Pricing starts at $9,999, with shipping targeted for Q2 2026.

The hardware claims are ambitious. QuietBox 2 uses four Blackhole ASICs in a unified mesh, with 480 Tensix cores, 2,654 TFLOPS at BlockFP8 precision, 128 GB of GDDR6, and 256 GB of DDR5 system memory. StorageReview says Tenstorrent is preloading real workloads across language, multimodal, and scientific use cases: GPT-OSS 120B for local inference, Llama 3.1 70B at a reported 476.5 tokens per second, Qwen3-32B as a local coding agent, Flux for image generation, Wan 2.2 for video, and Boltz-2 for biomolecular ML. The company's pitch is that developers can compile models from major frameworks onto the hardware through TT-Forge rather than treat the appliance as a closed box.

The open-stack angle is important here. Tenstorrent says TT-Forge, TT-Metalium, and TT-LLK are all part of an inspectable software path from model graph down to kernel execution, which is unusual in a market dominated by opaque accelerator stacks. Reddit commenters liked the move to a standard wall outlet and noted that the system is reportedly $2,000 cheaper than the first QuietBox generation, but they did not give Tenstorrent a free pass. Some questioned whether the published token-per-second numbers will hold up on real models, others compared the price to Nvidia alternatives, and some dug straight into bandwidth and software-maintenance concerns.

That tension is what makes the post worth watching. QuietBox 2 is not only a hardware launch; it is a test of whether a fully local, more open AI workstation can win mindshare among developers who care about sovereignty, inspectability, and offline deployment. If Tenstorrent can back the performance claims with reproducible workloads, LocalLLaMA's interest could translate into a real niche for desk-side inference outside the usual GPU ecosystem. Original source: StorageReview. Community discussion: r/LocalLLaMA.

r/LocalLLaMA flags Tenstorrent QuietBox 2 as a desk-side RISC-V box for local AI inference

Related Articles

LocalLLaMA users warn that DGX Spark still lacks a production-ready NVFP4 story

LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200

GPT-5.5 reaches the API with fewer retries and higher efficiency

Comments (0)

Leave a Comment

Related Articles

LocalLLaMA users warn that DGX Spark still lacks a production-ready NVFP4 story
AI Reddit Apr 5, 2026 2 min read

LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200

GPT-5.5 reaches the API with fewer retries and higher efficiency