#local-ai

LLM Hacker News Mar 22, 2026 2 min read

Hacker News Tracks tinybox as Offline AI Hardware Moves Into 120B-Class Territory

A March 21, 2026 Hacker News discussion sent tinygrad's tinybox page back up the front page and put a shipping local AI workstation in front of builders looking beyond rented GPU time. The product pitch is notable because it pairs concrete specs with pricing that targets labs and startups trying to run bigger models on premises.

#tinygrad #tinybox #local-ai

LLM Reddit Mar 20, 2026 2 min read

r/LocalLLaMA Pushes Hugging Face hf-agents as a One-Command Local Coding Stack

A March 17, 2026 r/LocalLLaMA post about Hugging Face hf-agents reached 624 points and 78 comments at crawl time. The extension uses llmfit to detect hardware, recommends a runnable model and quant, starts llama.cpp, and launches the Pi coding agent.

#hugging-face #llmfit #llama-cpp

LLM Reddit Mar 17, 2026 2 min read

r/LocalLLaMA Questions OpenCode’s Local Story After Finding `serve` Proxies the UI to app.opencode.ai

On March 16, 2026, a r/LocalLLaMA post questioning OpenCode’s local behavior reached 389 points and 154 comments. The post argued that the `opencode serve` web UI path proxies to app.opencode.ai and backed that claim with a linked code path plus related GitHub issues and PRs.

#opencode #local-ai #self-hosting

LLM Hacker News Mar 17, 2026 2 min read

Hacker News Resurfaces a Fully Local Home Assistant Voice Stack Built Around llama.cpp

A March 16, 2026 Hacker News thread resurfaced a detailed Home Assistant community write-up that logged 310 points and 92 comments, showing how a local-first voice assistant stack can combine llama.cpp, Parakeet V2 STT, Kokoro TTS, and prompt tuning into a usable system.

#home-assistant #voice-assistant #llama.cpp

LLM Reddit Mar 15, 2026 2 min read

LocalLLaMA Highlights a New Linux Path for Running LLMs on AMD Ryzen AI NPUs

Community discussion in LocalLLaMA pointed to a March 11, 2026 FastFlowLM and Lemonade update that brings Linux support to AMD XDNA 2 NPUs, including setup guidance for Ubuntu and Arch systems.

#amd #npu #linux

LLM Hacker News Mar 13, 2026 2 min read

Hacker News spots CanIRun.ai, a browser-side local AI compatibility checker

CanIRun.ai runs entirely in the browser, detects GPU, CPU, and RAM through WebGL, WebGPU, and navigator APIs, and estimates which quantized models fit your machine. HN readers liked the idea but immediately pushed on missing hardware entries, calibration, and reverse-lookup features.

#local-ai #llm-inference #hardware

AI Mar 13, 2026 2 min read

Perplexity unveils Personal Computer, an always-on local AI proxy built around a Mac mini

Perplexity has introduced Personal Computer, an always-on local agent system that runs through a continuously operating Mac mini and exposes files, apps, and sessions to Perplexity Computer and the Comet Assistant. The company is pitching it as a persistent AI operating system with human approval, logging, and a kill switch for sensitive actions.

#perplexity #personal-computer #agents

LLM Hacker News Mar 11, 2026 2 min read

Hacker News Highlights RunAnywhere's Local Voice AI Stack for Apple Silicon

A Launch HN thread pushed RunAnywhere's RCLI into view as an Apple Silicon-first macOS voice AI stack that combines STT, LLM, TTS, local RAG, and 38 system actions without relying on cloud APIs.

#apple-silicon #local-ai #voice-ai

LLM Hacker News Mar 2, 2026 1 min read

llmfit: Auto-Select the Right LLM Model for Your Hardware

llmfit is an open-source CLI tool that automatically detects your system's RAM, CPU, and GPU specs to recommend the optimal LLM model and quantization level, dramatically lowering the barrier to running local AI.

#llm #open-source #hardware-optimization

LLM Hacker News Mar 2, 2026 1 min read

llmfit: Auto-Select the Right LLM Model for Your Hardware

#llm #open-source #hardware-optimization

AI Reddit Mar 1, 2026 1 min read

Bare-Metal AI: Running LLM Inference Directly in UEFI, No OS or Kernel Required

A developer has implemented a UEFI application that runs LLM inference directly from boot without any operating system or kernel, using zero-dependency C code for the entire stack from tokenizer to inference engine.

#bare-metal #llm-inference #uefi

LLM Feb 23, 2026 1 min read

Ollama 0.17 Arrives with New Inference Engine: Up to 40% Faster Local AI

Ollama 0.17, released February 22, introduces a new native inference engine replacing llama.cpp server mode, delivering up to 40% faster prompt processing and 18% faster token generation on NVIDIA GPUs, plus improved multi-GPU tensor parallelism and AMD RDNA 4 support.

#open-source #ollama #local-ai