A March 21, 2026 Hacker News discussion sent tinygrad's tinybox page back up the front page and put a shipping local AI workstation in front of builders looking beyond rented GPU time. The product pitch is notable because it pairs concrete specs with pricing that targets labs and startups trying to run bigger models on premises.
#local-ai
RSS FeedA March 17, 2026 r/LocalLLaMA post about Hugging Face hf-agents reached 624 points and 78 comments at crawl time. The extension uses llmfit to detect hardware, recommends a runnable model and quant, starts llama.cpp, and launches the Pi coding agent.
On March 16, 2026, a r/LocalLLaMA post questioning OpenCode’s local behavior reached 389 points and 154 comments. The post argued that the `opencode serve` web UI path proxies to app.opencode.ai and backed that claim with a linked code path plus related GitHub issues and PRs.
A March 16, 2026 Hacker News thread resurfaced a detailed Home Assistant community write-up that logged 310 points and 92 comments, showing how a local-first voice assistant stack can combine llama.cpp, Parakeet V2 STT, Kokoro TTS, and prompt tuning into a usable system.
Community discussion in LocalLLaMA pointed to a March 11, 2026 FastFlowLM and Lemonade update that brings Linux support to AMD XDNA 2 NPUs, including setup guidance for Ubuntu and Arch systems.
CanIRun.ai runs entirely in the browser, detects GPU, CPU, and RAM through WebGL, WebGPU, and navigator APIs, and estimates which quantized models fit your machine. HN readers liked the idea but immediately pushed on missing hardware entries, calibration, and reverse-lookup features.
Perplexity has introduced Personal Computer, an always-on local agent system that runs through a continuously operating Mac mini and exposes files, apps, and sessions to Perplexity Computer and the Comet Assistant. The company is pitching it as a persistent AI operating system with human approval, logging, and a kill switch for sensitive actions.
A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.
llmfit is an open-source CLI tool that automatically detects your system's RAM, CPU, and GPU specs to recommend the optimal LLM model and quantization level, dramatically lowering the barrier to running local AI.
llmfit is an open-source CLI tool that automatically detects your system's RAM, CPU, and GPU specs to recommend the optimal LLM model and quantization level, dramatically lowering the barrier to running local AI.
A developer has implemented a UEFI application that runs LLM inference directly from boot without any operating system or kernel, using zero-dependency C code for the entire stack from tokenizer to inference engine.
Ollama 0.17, released February 22, introduces a new native inference engine replacing llama.cpp server mode, delivering up to 40% faster prompt processing and 18% faster token generation on NVIDIA GPUs, plus improved multi-GPU tensor parallelism and AMD RDNA 4 support.