LocalLLaMA surfaces MIT-licensed GigaChat 3.1 open weights in 702B and 10B sizes

Original: New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B View original →

Read in other languages: 한국어日本語
LLM Mar 25, 2026 By Insights AI (Reddit) 1 min read 1 views Source

A March 24, 2026 LocalLLaMA post surfaced a notable open-weight release that might otherwise have flown past many English-language users: GigaChat 3.1 Ultra and GigaChat 3.1 Lightning are now publicly available on Hugging Face under an MIT license.

The release spans two very different operating points. GigaChat 3.1 Ultra is presented as a 702B-parameter Mixture-of-Experts model with 36B active parameters for large-cluster inference, while Lightning is a 10B MoE with 1.8B active parameters aimed at faster deployment and lighter inference. The model pages also list FP8 checkpoints, BF16 variants, and GGUF builds.

  • The model cards describe a custom MoE stack with Multi-head Latent Attention and Multi-Token Prediction.
  • They also describe the broader GigaChat 3 training corpus as multilingual, spanning 10 languages and roughly 5.5 trillion synthetic tokens.
  • The Reddit post emphasizes English and Russian optimization, open weights, and benchmark claims against DeepSeek V3, Qwen3, Gemma 3, and smaller tool-use baselines.

What made the LocalLLaMA thread interesting was not only the parameter count. Packaging matters here. Ultra is framed as a cluster-scale model, while Lightning tries to preserve tool use and long-context capability with a much smaller active compute budget. Releasing FP8, BF16, and GGUF variants broadens who can actually experiment with the models instead of just reading about them.

Benchmark claims in the Reddit post should still be treated as vendor-reported until more outside evaluation arrives. Even with that caveat, the release is meaningful because it adds another multilingual, MIT-licensed option to the open model pool at both the large-cluster and compact ends of the spectrum.

Primary source: GigaChat 3.1 Hugging Face collection. Community source: LocalLLaMA discussion.

Share: Long

Related Articles

LLM Reddit Mar 12, 2026 1 min read

NVIDIA's new Nemotron 3 Super pairs a 120B total / 12B active hybrid Mamba-Transformer MoE with a native 1M-token context window and open weights, datasets, and recipes. LocalLLaMA discussion centered on whether those openness and efficiency claims translate into realistic home-lab deployments.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.