r/LocalLLaMA flags Tenstorrent QuietBox 2 as a desk-side RISC-V box for local AI inference
Original: Tenstorrent QuietBox 2 Brings RISC-V AI Inference to the Desktop View original →
r/LocalLLaMA surfaced a piece on March 13, 2026 about Tenstorrent's TT-QuietBox 2, and the reaction was exactly what you would expect from a local-inference community: equal parts curiosity, benchmark skepticism, and price math. At crawl time on March 14, 2026, the Reddit post had 79 upvotes and 38 comments. The reason it drew attention is straightforward. QuietBox 2 is not another cloud appliance or datacenter rack announcement. It is framed as a desk-side system for running large AI workloads locally, with Tenstorrent leaning hard on open tooling and RISC-V branding.
According to StorageReview, the liquid-cooled workstation is designed to run models up to 120 billion parameters entirely on premises. Tenstorrent positions the machine as a private inference box for labs, offices, and small to medium businesses that want more control over the full stack. The article says it ships with Ubuntu 24.04, plugs into a standard 120V outlet, and avoids the normal rack, cooling, and dedicated-power assumptions that often push serious AI infrastructure out of reach for smaller teams. Pricing starts at $9,999, with shipping targeted for Q2 2026.
The hardware claims are ambitious. QuietBox 2 uses four Blackhole ASICs in a unified mesh, with 480 Tensix cores, 2,654 TFLOPS at BlockFP8 precision, 128 GB of GDDR6, and 256 GB of DDR5 system memory. StorageReview says Tenstorrent is preloading real workloads across language, multimodal, and scientific use cases: GPT-OSS 120B for local inference, Llama 3.1 70B at a reported 476.5 tokens per second, Qwen3-32B as a local coding agent, Flux for image generation, Wan 2.2 for video, and Boltz-2 for biomolecular ML. The company's pitch is that developers can compile models from major frameworks onto the hardware through TT-Forge rather than treat the appliance as a closed box.
The open-stack angle is important here. Tenstorrent says TT-Forge, TT-Metalium, and TT-LLK are all part of an inspectable software path from model graph down to kernel execution, which is unusual in a market dominated by opaque accelerator stacks. Reddit commenters liked the move to a standard wall outlet and noted that the system is reportedly $2,000 cheaper than the first QuietBox generation, but they did not give Tenstorrent a free pass. Some questioned whether the published token-per-second numbers will hold up on real models, others compared the price to Nvidia alternatives, and some dug straight into bandwidth and software-maintenance concerns.
That tension is what makes the post worth watching. QuietBox 2 is not only a hardware launch; it is a test of whether a fully local, more open AI workstation can win mindshare among developers who care about sovereignty, inspectability, and offline deployment. If Tenstorrent can back the performance claims with reproducible workloads, LocalLLaMA's interest could translate into a real niche for desk-side inference outside the usual GPU ecosystem. Original source: StorageReview. Community discussion: r/LocalLLaMA.
Related Articles
Machine unlearning is only useful if auditors can prove what was forgotten. Google Research introduced Regularized f-Divergence Kernel Tests on June 10, 2026 and reported that one privacy violation could be detected with thousands of samples instead of millions.
AI data centers have become a target for covert influence work. OpenAI said on June 10, 2026 that it banned two likely China-origin ChatGPT account clusters that generated posts and images around electricity prices, tariffs, and US tech policy.
Content provenance is becoming a compliance layer, not a nice-to-have label. OpenAI said on June 11, 2026 that it supports the EU transparency code for AI-generated content and will rely on C2PA metadata, SynthID watermarks, and public verification.