r/LocalLLaMA spotlights GigaChat 3.1 open weights from 10B to 702B
Original: New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B View original →
A high-engagement r/LocalLLaMA post announced two new open-weights releases under the MIT license: GigaChat-3.1-Ultra, a 702B A36B mixture-of-experts model, and GigaChat-3.1-Lightning, a 10B A1.8B MoE aimed at far smaller deployments. The post is notable because it does not present the release as a minor fine-tune. The team says both models were pretrained from scratch on its own data and hardware, with English and Russian as core optimization targets and 14 languages in the training mix.
The smaller Lightning model is the more immediately practical story for the local-model community. The authors claim a 256k context window, strong tool-calling behavior, and FP8 plus multi-token prediction support that keeps throughput high on a single H100 benchmark setup. They report a BFCL v3 score of 0.76 for tool use and compare Lightning against Qwen3, SmolLM3, Gemma 3, and YandexGPT lite models. The larger Ultra release targets multi-node environments, with the post saying it can run on three HGX instances and outperform several open-weight comparators on the team's internal benchmark table.
What makes this interesting beyond the headline numbers is the packaging. The release includes weights and GGUF variants on Hugging Face, and the team links a longer technical report on Habr. That gives the community something more useful than a teaser: people can inspect licensing, evaluate deployment fit, and decide whether the multilingual and CIS-focused angle fills a gap that US- and China-centered open model ecosystems often leave open.
The usual caveat applies. These benchmark tables are vendor-reported claims, not independent reproductions, so the real test will be community evaluations on coding, reasoning, latency, and quantized inference. Even so, r/LocalLLaMA treated the announcement as a meaningful addition to the open-weights landscape because it spans both frontier-scale and genuinely deployable sizes.
Why the post stood out
- It ships both a very large 702B MoE and a local-friendlier 10B A1.8B MoE.
- The models are released under MIT terms with Hugging Face weights and GGUFs.
- The team claims training from scratch rather than a simple downstream fine-tune.
- Multilingual support and Russian/CIS optimization give the release a distinct regional angle.
Related Articles
LocalLLaMA surfaced an MIT-licensed GigaChat 3.1 release that pairs a 702B MoE model for clusters with a 10B MoE model aimed at faster deployment and lighter inference.
NVIDIA's new Nemotron 3 Super pairs a 120B total / 12B active hybrid Mamba-Transformer MoE with a native 1M-token context window and open weights, datasets, and recipes. LocalLLaMA discussion centered on whether those openness and efficiency claims translate into realistic home-lab deployments.
A new r/LocalLLaMA thread argues that NVIDIA's Nemotron-Cascade-2-30B-A3B deserves more attention after quick local coding evals came in stronger than expected. The post is interesting because it lines up community measurements with NVIDIA's own push for a reasoning-oriented open MoE model that keeps activated parameters low.
Comments (0)
No comments yet. Be the first to comment!