HN latched onto the open-weight angle: a 35B MoE model with only 3B active parameters is interesting if it can actually carry coding-agent work. Qwen says Qwen3.6-35B-A3B improves sharply over Qwen3.5-35B-A3B, while commenters immediately moved to GGUF builds, Mac memory limits, and whether open-model-only benchmark tables are enough context.
#open-weights
RSS FeedLocalLLaMA paid attention because MiniMax tried to cool down the M2.7 license anxiety, but the thread still read the wording as muddy. What people wanted was not a softer tone, it was a clear answer on what self-hosted commercial use actually permits.
A popular r/LocalLLaMA thread argues that MiniMax M2.7 should be treated as an open-weights release with a restricted license, not as open source, because commercial use requires prior written authorization.
Mistral AI said on March 26, 2026 that Voxtral TTS offers expressive speech, support for 9 languages and dialects, low latency, and easy adaptation to new voices. Mistral’s March 23 launch post says the 4B-parameter model can adapt from about three seconds of reference audio, reaches roughly 70ms model latency, supports up to two minutes of native audio generation, and is available by API and as open weights.
A post in r/artificial pointed readers to Google DeepMind's Gemma 4 release, which packages advanced reasoning and agentic features under Apache 2.0. Google says the family spans four sizes, supports up to 256K context in larger models, and ships with day-one ecosystem support from Hugging Face to llama.cpp.
A March 2026 r/LocalLLaMA post with 123 points and 25 comments spotlighted `voxtral-voice-clone`, a project trying to train the missing codec encoder for Mistral’s Voxtral-4B-TTS-2603. The repo targets zero-shot cloning via `ref_audio`, which the original open-weight release could not support because the encoder weights were not included.
A March 26, 2026 r/LocalLLaMA post linking NVIDIA's `gpt-oss-puzzle-88B` model card reached 284 points and 105 comments at crawl time. NVIDIA says the 88B MoE model uses its Puzzle post-training NAS pipeline to cut parameters and KV-cache costs while keeping reasoning accuracy near or above the parent model.
Mistral promoted Voxtral TTS on X on March 26, 2026. Mistral's release post describes a 4B-parameter multilingual TTS model with nine-language support, low time-to-first-audio, availability in Mistral Studio and API, open weights on Hugging Face under CC BY-NC 4.0, and pricing at $0.016 per 1,000 characters.
Cohere announced Transcribe on March 26, 2026 as an open-source speech recognition model. Cohere says the 2B Conformer-based system supports 14 languages, tops the Hugging Face Open ASR Leaderboard with 5.42 average WER, ships under Apache 2.0, and is available for download, API use, and Model Vault deployment.
A high-signal LocalLLaMA thread formed around Voxtral TTS because Mistral paired low latency, multilingual support, and open weights in a part of the stack many teams still keep closed.
A r/LocalLLaMA thread spread reports that NVIDIA could spend $26 billion over five years on open-weight AI models, but the real discussion centered on strategy rather than headline alone. NVIDIA’s March 2026 Nemotron 3 Super release gives the clearest evidence that the company wants open models, tooling, and Blackwell-optimized deployment to move together.
r/LocalLLaMA responded strongly to GigaChat 3.1 because the release spans a local-friendly 10B A1.8B MoE and a 702B frontier-scale MoE, both under MIT terms and both presented as trained from scratch.