r/LocalLLaMA spotlights GigaChat 3.1 open weights from 10B to 702B

Original: New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B View original →

Read in other languages: 한국어日本語
LLM Mar 25, 2026 By Insights AI (Reddit) 2 min read 1 views Source

A high-engagement r/LocalLLaMA post announced two new open-weights releases under the MIT license: GigaChat-3.1-Ultra, a 702B A36B mixture-of-experts model, and GigaChat-3.1-Lightning, a 10B A1.8B MoE aimed at far smaller deployments. The post is notable because it does not present the release as a minor fine-tune. The team says both models were pretrained from scratch on its own data and hardware, with English and Russian as core optimization targets and 14 languages in the training mix.

The smaller Lightning model is the more immediately practical story for the local-model community. The authors claim a 256k context window, strong tool-calling behavior, and FP8 plus multi-token prediction support that keeps throughput high on a single H100 benchmark setup. They report a BFCL v3 score of 0.76 for tool use and compare Lightning against Qwen3, SmolLM3, Gemma 3, and YandexGPT lite models. The larger Ultra release targets multi-node environments, with the post saying it can run on three HGX instances and outperform several open-weight comparators on the team's internal benchmark table.

What makes this interesting beyond the headline numbers is the packaging. The release includes weights and GGUF variants on Hugging Face, and the team links a longer technical report on Habr. That gives the community something more useful than a teaser: people can inspect licensing, evaluate deployment fit, and decide whether the multilingual and CIS-focused angle fills a gap that US- and China-centered open model ecosystems often leave open.

The usual caveat applies. These benchmark tables are vendor-reported claims, not independent reproductions, so the real test will be community evaluations on coding, reasoning, latency, and quantized inference. Even so, r/LocalLLaMA treated the announcement as a meaningful addition to the open-weights landscape because it spans both frontier-scale and genuinely deployable sizes.

Why the post stood out

  • It ships both a very large 702B MoE and a local-friendlier 10B A1.8B MoE.
  • The models are released under MIT terms with Hugging Face weights and GGUFs.
  • The team claims training from scratch rather than a simple downstream fine-tune.
  • Multilingual support and Russian/CIS optimization give the release a distinct regional angle.
Share: Long

Related Articles

LLM Reddit Mar 12, 2026 1 min read

NVIDIA's new Nemotron 3 Super pairs a 120B total / 12B active hybrid Mamba-Transformer MoE with a native 1M-token context window and open weights, datasets, and recipes. LocalLLaMA discussion centered on whether those openness and efficiency claims translate into realistic home-lab deployments.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.