Sarvam open-sources 30B and 105B reasoning models trained in India
Original: New OpenSource Models Available—Sarvam 30B and 105B trained from scratch by an Indian based company View original →
Reddit thread: LocalLLaMA discussion
Official blog: Open-Sourcing Sarvam 30B and 105B
Model downloads: Sarvam 30B / Sarvam 105B
LocalLLaMA picked up Sarvam AI’s March 6 release because it is not just another checkpoint drop. Sarvam is open-sourcing two reasoning-oriented foundation models, Sarvam 30B and Sarvam 105B, and describing them as trained from scratch rather than adapted from an upstream Western model family. The company says the work was carried out entirely in India on compute provided under the IndiaAI Mission, with the full stack built in-house across data curation, tokenizer design, model architecture, supervised fine-tuning and reinforcement learning.
The architectural story is fairly ambitious. Both models use a Mixture-of-Experts Transformer backbone with sparse expert routing, long-context support and inference optimizations aimed at keeping deployment practical. Sarvam says the 30B variant uses Grouped Query Attention while the 105B model extends the design with Multi-head Latent Attention for more efficient long-context serving. The company also emphasizes a tokenizer optimized for all 22 scheduled Indian languages across 12 scripts, which directly matters for throughput and serving cost in multilingual Indian deployments.
What the published numbers say
The training scale is large. Sarvam says the 30B model saw 16 trillion tokens and the 105B model 12 trillion, spanning code, general web, mathematics and multilingual data. On benchmarks, the company positions 105B as the higher-end reasoning and agentic system: 98.6 on Math500, 90.6 on MMLU, 81.7 on MMLU Pro, 71.7 on LiveCodeBench v6, and 68.3 on Tau2 for long-horizon agentic tasks. It also reports 88.3 Pass@1 on AIME 25, improving to 96.7 with tool use.
The 30B model is framed differently: a more deployable reasoning model with only 2.4B active parameters at inference time. Sarvam reports 97.0 on Math500, 92.1 on HumanEval, 92.7 on MBPP, 70.0 on LiveCodeBench v6, and the same 88.3 Pass@1 on AIME 25 before tool-assisted improvement. On Indian-language evaluation, the company says 105B wins an average of 90% of pairwise comparisons and 30B wins 89%, using a benchmark built from 110 English prompts translated into 22 scheduled Indian languages in both native and romanized forms.
Those details matter because they show what Sarvam is actually trying to optimize for. This is not only a leaderboard push. The release is positioned as sovereign AI infrastructure for India: strong reasoning and coding, explicit support for agentic workloads, and tighter latency and tokenization economics for Indian languages. Sarvam says both models are already used in production, with 30B powering conversational systems and 105B powering Indus, its assistant for more complex reasoning and tool use.
Whether the broader open-model ecosystem treats Sarvam as a new frontline competitor will depend on outside evaluation, but the release is notable on its own terms. Apache 2.0 licensing, training from scratch, India-specific tokenizer and evaluation work, and a clear focus on deployable reasoning models make this one of the more consequential open-model launches surfaced in LocalLLaMA this week.
Related Articles
Why it matters: Moonshot is turning “agent swarm” from a demo phrase into an execution claim with real scale numbers. The Kimi post says one run can coordinate 300 sub-agents across 4,000 steps and return 100-plus files instead of chat transcripts.
Google DeepMind has introduced Gemma 4 as a new open-model family built from Gemini 3 research. The lineup spans E2B and E4B edge models through 26B and 31B local-workstation models, with function calling, multimodal reasoning, and 140-language support at the center of the release.
Anthropic said on April 3, 2026 that its Fellows program had produced a new method for surfacing behavioral differences between AI models. The accompanying research frames the tool as a high-recall screening method for finding novel model-specific behaviors that standard benchmarks may miss.
Comments (0)
No comments yet. Be the first to comment!