MiniMax M3 weights hit Hugging Face with 428B total parameters
Original: MiniMax M3 weights arrive on Hugging Face with 428B parameters View original →
MiniMax M3 is now a concrete open-weight release rather than only a benchmark post. The MiniMax official account posted on June 12, 2026 at 14:11 UTC that the weights were live on Hugging Face and linked the MiniMax Sparse Attention paper.
The tweet’s key figure was ~428B parameters and ~23B activated parameters. FxTwitter showed more than 528,000 views, 2,485 likes, and 301 reposts. The quoted earlier post adds the benchmark frame: 59.0% on SWE-Bench Pro, 66.0% on Terminal Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas.
The Hugging Face model card describes MiniMax-M3 as a native multimodal model with a 1M-token context window. It says MiniMax Sparse Attention improves long-context efficiency, with 9x prefill and 15x decode speedups over M2 at 1M context and per-token compute reduced to 1/20. The card also points users to SGLang, vLLM, and Transformers deployment paths.
MiniMax’s official account usually publishes model, API, and agent product updates. This post is material because it changes access: researchers and builders can inspect weights and try supported serving stacks. The next checks are license constraints, real serving cost, independent long-context quality tests, and whether the coding-agent benchmark claims survive third-party evaluation. NVIDIA AI’s same-day note about a free GPU-accelerated endpoint may also broaden early testing. Source tweet
Related Articles
A popular r/LocalLLaMA thread argues that MiniMax M2.7 should be treated as an open-weights release with a restricted license, not as open source, because commercial use requires prior written authorization.
The thread’s energy centered on the architecture claim: what does “encoder-free” really mean for a 12B multimodal model?
A high-engagement r/LocalLLaMA post surfaced the Qwen3.5-35B-A3B model card on Hugging Face. The card emphasizes MoE efficiency, long context handling, and deployment paths across common open-source inference stacks.