Local multimodal AI is moving into the 12B class. Google Gemma introduced Gemma 4 12B under Apache 2.0, describing a unified encoder-free design for image, audio, and text inputs.
Local multimodal AI is moving into the 12B class. Google Gemma introduced Gemma 4 12B under Apache 2.0, describing a unified encoder-free design for image, audio, and text inputs.
NVIDIA released Cosmos 3 as an open physical AI omnimodel with Super and Nano variants. Its technical post points to six synthetic datasets, Hugging Face checkpoints, and GitHub recipes for domain adaptation.
DeepSeek turned a temporary V4-Pro API discount into standard pricing, intensifying the cost race around frontier-class LLM access. The posted table cuts output pricing from $3.48 to $0.87 per million tokens.
LocalLLaMA reacted hard because DeepSeek's visual-primitives idea makes points and boxes part of reasoning itself, and the repo going private only made the thread hotter.
LocalLLaMA paid attention to Granite 4.1 because IBM went in the opposite direction from giant reasoning hype: a broad release built around dense 3B, 8B, and 30B language models tuned for instruction following and tool calling. Comments welcomed the extra competition, but also pushed back on how strong the benchmarks really are.
Why it matters: Moonshot is turning “agent swarm” from a demo phrase into an execution claim with real scale numbers. The Kimi post says one run can coordinate 300 sub-agents across 4,000 steps and return 100-plus files instead of chat transcripts.
PrismML is testing whether smaller open models can stay useful by changing the weight format, not only the architecture. Ternary Bonsai ships 8B, 4B and 1.7B models at 1.58 bits, with the 8B variant listed at 1.75GB.
Why it matters: NVIDIA is turning quantum calibration and error correction into an open model-and-tooling stack instead of a lab-only workflow. The April 14 tweet framed Ising as an open suite, and NVIDIA’s technical post says Ising Calibration 1 scored 14.5% above GPT-5.4 and 3.27% above Gemini 3.1 Pro on QCalEval.
NVIDIA is turning quantum chip calibration and error correction into an open AI stack, with one model family that beats GPT 5.4 on QCalEval and another that speeds decoding by 2.25x. If those gains travel outside NVIDIA's own workflow, one of quantum computing's nastiest software bottlenecks just moved closer to something teams can actually deploy.
In a 1247-point Hacker News thread, AISLE argued that small open-weight models can recover much of Mythos-style exploit analysis when the context is tightly scoped, and the comments pushed back hard on the methodology.
An AISLE post that surged on Hacker News argues that Anthropic’s Mythos launch proves the category, but not an exclusive moat. In AISLE’s tests, small and open models recovered major parts of the showcased vulnerability work once the right code path was isolated.
A high-scoring LocalLLaMA thread amplified AISLE's claim that smaller open or low-cost models reproduced much of the vulnerability analysis Anthropic highlighted for Mythos. The central Reddit pushback was that reasoning over an isolated vulnerable function is very different from autonomously finding that bug inside a large codebase.