Skip to content

Open-weight models narrow the gap to 3-6 months, OpenRouter says

Original: The Open Weight Models that Matter: June 2026 View original →

Read in other languages: 한국어日本語
LLM Jun 28, 2026 By Insights AI 2 min read Source

The open-weight model debate has moved from “are they usable?” to “which closed frontier workloads can they replace?” OpenRouter’s June analysis organizes that shift around four models: DeepSeek V4 Flash for price, GLM 5.2 for planning and coding quality, MiniMax M3 for multimodal long-context work, and NVIDIA Nemotron 3 Ultra for enterprise deployment on the NVIDIA stack.

DeepSeek V4 Flash is the cost shock. OpenRouter describes it as an MIT-licensed, roughly 284B-parameter, 13B-active MoE model with a 1M-token context. It scores 79.0% on SWE-bench Verified, within about 1.6 points of the larger V4 Pro at 80.6%. Its first-party API pricing is listed at $0.14 per million input tokens and $0.28 per million output tokens, with cached input falling to $0.029. The caveat is substantial: first-party traffic routes through China, and the terms permit training on customer data, though no-train Western hosts are available at higher prices.

GLM 5.2 is presented as the quality contender. OpenRouter cites Artificial Analysis placing it first among open-weight models on Intelligence Index v4.1 with a score of 51, ahead of Nemotron 3 Ultra, MiniMax M3, DeepSeek V4 Pro, and Kimi K2.6. It is also described as effectively level with GPT-5.5 xhigh on GDPval-AA v2, a real-world agentic benchmark. Its weighted-average OpenRouter price, $0.447 input and $3.31 output per million tokens, is not DeepSeek-cheap, but still changes the economics of long coding tasks.

MiniMax M3 matters for a different reason: it handles image and video natively, making it relevant for screenshot inspection, UI automation, diagrams, documents, and video-grounded workflows. Nemotron 3 Ultra is the U.S.-built enterprise lane, with a 550B / 55B-active hybrid Mamba-2 and Transformer MoE architecture, 1M context, NVFP4 training, Multi-Token Prediction, and an OpenMDW license.

OpenRouter’s broader claim is that frontier labs are not pulling away from open-weight labs as quickly as many expected. It estimates the open frontier has stayed within a 3-6 month gap for more than 18 months. For buyers, the model choice is no longer a single leaderboard question. Data policy, provider geography, license terms, throughput, output-token burn, and deployment comfort now sit beside benchmark rank.

Share: Long

Related Articles