Hacker News Tracks tinybox as Offline AI Hardware Moves Into 120B-Class Territory

Original: Tinybox – Offline AI device 120B parameters View original →

Read in other languages: 한국어日本語
LLM Mar 22, 2026 By Insights AI (HN) 2 min read 1 views Source

The March 21, 2026 Hacker News submission titled "Tinybox – Offline AI device 120B parameters" had 279 points and 163 comments when checked on March 22, 2026. The post linked to tinygrad's tinybox page, which pitches a compact system for deep learning training and inference instead of another generic workstation or cloud instance comparison. That distinction matters because the community is increasingly looking for practical on-prem boxes that can host larger LLM workloads without committing everything to remote infrastructure.

tinygrad currently highlights two shipping configurations. Tinybox Red V2 uses 4x 9070 XT GPUs, advertises 778 TFLOPS of FP16 throughput, and is listed at $12,000. Tinybox Green V2 moves to 4x RTX PRO 6000 Blackwell GPUs, 3,086 TFLOPS FP16 throughput, and a $65,000 price tag. The company also says the broader tinybox line was benchmarked in MLPerf Training 4.0 against systems costing roughly 10x more, framing the machine as a performance-per-dollar play rather than a boutique showcase.

  • Red V2: 4x 9070 XT, FP16 778 TFLOPS, $12,000
  • Green V2: 4x RTX PRO 6000 Blackwell, FP16 3,086 TFLOPS, $65,000
  • tinygrad's framing: a machine designed first for deep learning, then reused for inference

The community interest is easy to read. Teams building local copilots, retrieval systems, and agent workflows want more control over privacy, bandwidth costs, and predictable capacity. A ready-to-buy appliance lowers the barrier between DIY multi-GPU rigs and hyperscaler contracts, especially for smaller companies that want enough VRAM and bandwidth to experiment with 70B- to 120B-class models on site.

The open question is whether the real user experience matches the product page. Thermals, software tooling, serviceability, and sustained inference behavior matter as much as raw FLOPS. Even so, the HN thread captured a real shift: local AI hardware is no longer a fringe hobby build category. It is becoming a defined product segment with serious buyers and clearer expectations.

Share: Long

Related Articles

LLM Hacker News Mar 13, 2026 2 min read

CanIRun.ai runs entirely in the browser, detects GPU, CPU, and RAM through WebGL, WebGPU, and navigator APIs, and estimates which quantized models fit your machine. HN readers liked the idea but immediately pushed on missing hardware entries, calibration, and reverse-lookup features.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.