OmniCoder-9B packages agent-style coding behavior into a smaller open model by training on more than 425,000 curated trajectories from real tool-using workflows.
#open-models
A post in r/MachineLearning argues that duplicating a specific seven-layer block inside Qwen2-72B improved benchmark performance without changing any weights.
NVIDIA AI Developer introduced Nemotron 3 Super on March 11, 2026 as an open 120B-parameter hybrid MoE model with 12B active parameters and a native 1M-token context window. NVIDIA says the model targets agentic workloads with up to 5x higher throughput than the previous Nemotron Super model.
Microsoft says Fireworks AI is now part of Microsoft Foundry, bringing high-performance, low-latency open-model inference to Azure. The launch emphasizes day-zero access to leading open models, custom-model deployment, and enterprise controls in one place.
A high-scoring LocalLLaMA thread surfaced Sarvam AI's release of two Apache 2.0 reasoning models, Sarvam 30B and Sarvam 105B. The company says both were trained from scratch in India, use Mixture-of-Experts designs, and target reasoning, coding, agentic workflows, and Indian-language performance.
Mistral has launched Mistral 3, a new open multimodal family with dense 14B, 8B, and 3B models under Apache 2.0, plus a larger Mistral Large 3. The company says the lineup was trained from scratch and tuned for both Blackwell NVL72 systems and single-node 8xA100 or 8xH100 deployments.
NVIDIA’s January 5, 2026 update expands its open AI stack across Nemotron, Cosmos, Alpamayo, Isaac GR00T and Clara. The company paired model releases with large-scale datasets and deployment pathways to accelerate production AI adoption across industries.
A high-engagement r/LocalLLaMA thread tracked the MiniMax-M2.5 release on Hugging Face. The model card emphasizes agentic coding/search benchmarks, runtime speedups, and aggressive cost positioning.
A LocalLLaMA discussion of SWE-rebench January runs reports close top-tier results, with Claude Code leading pass@1 and pass@5 while open models narrow the gap.