HN Spotlight: Sarvam Open-Sources 30B and 105B in a Full-Stack IndiaAI Push
Original: Sarvam 105B, the first competitive Indian open source LLM View original →
Hacker News picked up Sarvam AI’s March 6 announcement that it is open-sourcing Sarvam 30B and Sarvam 105B, two reasoning-oriented models the company says were trained from scratch in India on compute provided under the IndiaAI mission. The underlying company post frames the release as more than a model drop: Sarvam is presenting a full stack that spans data curation, training, inference optimization, tokenizer work, and product deployment.
The technical shape is fairly specific. Both models use a sparse MoE Transformer backbone with 128 experts. Sarvam 30B uses Grouped Query Attention to keep KV-cache usage practical, while Sarvam 105B uses Multi-head Latent Attention to push memory efficiency further on longer contexts. Sarvam says the 30B model was trained on 16T tokens and the 105B on 12T tokens, with mixtures covering code, web data, math, multilingual content, and synthetic data. The company also emphasizes a tokenizer optimized for 22 scheduled Indian languages across 12 scripts.
The benchmark claims are what pushed the HN post into wider circulation. Sarvam 105B is positioned as a competitive open model for reasoning, coding, and agentic work, with posted scores such as 71.7 on LiveCodeBench v6, 90.6 on MMLU, 88.3 Pass@1 on AIME 25, and 68.3 on Tau2 average. Sarvam 30B is presented as the efficiency-first deployment option, with 2.4B active parameters and strong scores on HumanEval, MBPP, BrowseComp, and Tau2. Sarvam says both models are already in production: 30B powers Samvaad and 105B powers Indus.
What makes the release notable is the operational story around it. The post spends significant space on fused kernels, scheduling, disaggregated serving, and throughput claims across H100, L40S, and Apple Silicon. In other words, Sarvam is not only publishing weights; it is arguing that open models become more relevant when the inference stack is tuned for real workloads and regional language coverage.
For builders, the practical takeaway is straightforward. This is a sovereign-model effort that tries to compete on reasoning quality, agentic utility, and serving efficiency at the same time. HN’s interest reflects that broader question: whether regional model labs can differentiate by owning the full pipeline rather than just matching headline parameter counts.
Primary source: Sarvam AI’s release post.
Related Articles
Mistral announced Mistral Small 4 on March 16, 2026 as a single open model that combines reasoning, multimodal input, and agentic coding. Key specs include 119B total parameters, 6B active parameters per token, a 256k context window, Apache 2.0 licensing, and configurable reasoning effort.
HN did not push Browser Harness because it was another browser wrapper. It took off because the repo lets an LLM patch its own browser helpers in the middle of a task, trading safety rails for raw flexibility.
A Show HN post points to llm-circuit-finder, a toolkit that duplicates selected transformer layers inside GGUF models and claims sizable reasoning gains without changing weights or running fine-tuning. The strongest benchmark numbers come from the project author’s own evaluations rather than independent validation.
Comments (0)
No comments yet. Be the first to comment!