LocalLLaMA surfaces MIT-licensed GigaChat 3.1 open weights in 702B and 10B sizes

A March 24, 2026 LocalLLaMA post surfaced a notable open-weight release that might otherwise have flown past many English-language users: GigaChat 3.1 Ultra and GigaChat 3.1 Lightning are now publicly available on Hugging Face under an MIT license.

The release spans two very different operating points. GigaChat 3.1 Ultra is presented as a 702B-parameter Mixture-of-Experts model with 36B active parameters for large-cluster inference, while Lightning is a 10B MoE with 1.8B active parameters aimed at faster deployment and lighter inference. The model pages also list FP8 checkpoints, BF16 variants, and GGUF builds.

The model cards describe a custom MoE stack with Multi-head Latent Attention and Multi-Token Prediction.
They also describe the broader GigaChat 3 training corpus as multilingual, spanning 10 languages and roughly 5.5 trillion synthetic tokens.
The Reddit post emphasizes English and Russian optimization, open weights, and benchmark claims against DeepSeek V3, Qwen3, Gemma 3, and smaller tool-use baselines.

What made the LocalLLaMA thread interesting was not only the parameter count. Packaging matters here. Ultra is framed as a cluster-scale model, while Lightning tries to preserve tool use and long-context capability with a much smaller active compute budget. Releasing FP8, BF16, and GGUF variants broadens who can actually experiment with the models instead of just reading about them.

Benchmark claims in the Reddit post should still be treated as vendor-reported until more outside evaluation arrives. Even with that caveat, the release is meaningful because it adds another multilingual, MIT-licensed option to the open model pool at both the large-cluster and compact ends of the spectrum.

Primary source: GigaChat 3.1 Hugging Face collection. Community source: LocalLLaMA discussion.

LLM Reddit Mar 12, 2026 1 min read

r/LocalLLaMA Reacts to NVIDIA's Open Nemotron 3 Super Release

NVIDIA's new Nemotron 3 Super pairs a 120B total / 12B active hybrid Mamba-Transformer MoE with a native 1M-token context window and open weights, datasets, and recipes. LocalLLaMA discussion centered on whether those openness and efficiency claims translate into realistic home-lab deployments.

#[#"#n

LLM Reddit 3d ago 2 min read

r/LocalLLaMA Benchmarks Nemotron Cascade as a Small Open Model With Outsized Coding Scores

A new r/LocalLLaMA thread argues that NVIDIA's Nemotron-Cascade-2-30B-A3B deserves more attention after quick local coding evals came in stronger than expected. The post is interesting because it lines up community measurements with NVIDIA's own push for a reasoning-oriented open MoE model that keeps activated parameters low.

#[#"#n

LLM Reddit 5d ago 2 min read

LocalLLaMA Tracks Mistral Small 4 as Mistral Collapses Instruct, Reasoning, and Devstral Into One MoE

A March 16, 2026 r/LocalLLaMA post about Mistral Small 4 reached 606 points and 232 comments in the latest available crawl. Mistral’s model card describes a 119B-parameter MoE with 4 active experts, 256k context, multimodal input, and a per-request switch between standard and reasoning modes.

#[#"#m

LocalLLaMA surfaces MIT-licensed GigaChat 3.1 open weights in 702B and 10B sizes

Related Articles

r/LocalLLaMA Reacts to NVIDIA's Open Nemotron 3 Super Release

r/LocalLLaMA Benchmarks Nemotron Cascade as a Small Open Model With Outsized Coding Scores

LocalLLaMA Tracks Mistral Small 4 as Mistral Collapses Instruct, Reasoning, and Devstral Into One MoE

Comments (0)

Leave a Comment