HN saw the appeal immediately: local prompts, no API keys, more privacy. The thread turned just as quickly to the friction points, especially the storage and hardware bill attached to browser-side AI.
#on-device
RSS FeedLocalLLaMA reacted because the post was not just another “new model feels strong” claim. The author said Qwen 3.6 handled workloads normally reserved for Opus and Codex on an M5 Max 128GB setup, but the practical hook was the warning to enable preserve_thinking.
Reddit picked up Google’s Gemma 4 edge rollout, focusing on Agent Skills in Google AI Edge Gallery and the LiteRT-LM runtime. The main claims are sub-1.5GB memory, a 128K context window, and published benchmarks on Raspberry Pi 5 and Qualcomm NPUs.
A Show HN post about Apfel cleared 513 points and 117 comments during this April 4, 2026 crawl, highlighting a Swift tool that turns Apple's on-device foundation model into a CLI, chat interface, and OpenAI-compatible local server on Apple Silicon.
A widely discussed LocalLLaMA post introduces open Kitten TTS v0.8 models (80M/40M/14M), emphasizing CPU-friendly deployment and sub-25MB footprint for the smallest variant.