LocalLLaMA reacted because the post was not just another “new model feels strong” claim. The author said Qwen 3.6 handled workloads normally reserved for Opus and Codex on an M5 Max 128GB setup, but the practical hook was the warning to enable preserve_thinking.
#on-device
RSS FeedGoogle's AI Edge team said on April 2, 2026 that Gemma 4 is bringing multi-step agentic workflows to phones, desktops, and edge hardware under an Apache 2.0 license. The launch combines open models, Agent Skills, and LiteRT-LM deployment tooling.
Reddit picked up Google’s Gemma 4 edge rollout, focusing on Agent Skills in Google AI Edge Gallery and the LiteRT-LM runtime. The main claims are sub-1.5GB memory, a 128K context window, and published benchmarks on Raspberry Pi 5 and Qualcomm NPUs.
A Show HN post about Apfel cleared 513 points and 117 comments during this April 4, 2026 crawl, highlighting a Swift tool that turns Apple's on-device foundation model into a CLI, chat interface, and OpenAI-compatible local server on Apple Silicon.
A widely discussed LocalLLaMA post introduces open Kitten TTS v0.8 models (80M/40M/14M), emphasizing CPU-friendly deployment and sub-25MB footprint for the smallest variant.