A smaller release drew outsized attention on LocalLLaMA because LFM2.5-350M is not trying to be a general-purpose chatbot. Liquid AI is pitching it as a compact model for tool use, structured outputs, and data-heavy edge workflows.
#small-models
RSS FeedA March 17, 2026 Hacker News post about GPT-5.4 mini and nano reached 236 points and 143 comments. OpenAI is positioning mini as a fast coding and tool-use model for Codex, the API, and ChatGPT, while nano targets cheaper classification, extraction, and subagent workloads.
OpenAI Developers said on X that GPT-5.4 mini and nano are now part of the GPT-5.4 family for developer workflows. OpenAI positions mini as a faster coding and tool-use model for API, Codex, and ChatGPT, while nano is the lowest-cost option for lighter API workloads.
A LocalLLaMA release post presents OmniCoder-9B as a Qwen3.5-9B-based coding agent fine-tuned on 425,000-plus agentic trajectories, with commenters focusing on its read-before-write behavior and usefulness at small model size.
A well-received r/LocalLLaMA experiment described tinyforge: Qwen 3.5 0.8B running on a MacBook Air, trained on 13 self-generated repair pairs from a test-feedback loop, with a reported holdout jump from 16/50 to 28/50.
A widely-shared r/LocalLLaMA comparison of Qwen's smallest models across three generations (score: 681) reveals extraordinary efficiency gains. The Qwen 3.5 9B now outperforms the previous-generation 80B on several benchmarks, while the 2B handles video understanding better than many 7B models.
Alibaba Qwen team released the Qwen 3.5 small model series (0.8B to 9B). Models run in-browser via WebGPU and show dramatic benchmark improvements over previous generations.
Researchers have demonstrated that transformer models with fewer than 100 parameters can add two 10-digit numbers with 100% accuracy. The key ingredient is digit tokenization rather than treating numbers as opaque strings — a finding with implications for mathematical reasoning in larger LLMs.