GLM-5.2 pushes open weights into the cost-versus-reasoning debate
Original: GLM-5.2 is the new leading open weights model on Artificial Analysis View original →
GLM-5.2 has become the leading open weights model on Artificial Analysis Intelligence Index v4.1, scoring 51 and moving ahead of MiniMax-M3, DeepSeek V4 Pro, and Kimi K2.6. It keeps the same broad size profile as GLM-5.1, at 744B total parameters and 40B active parameters, but gains 11 points on the index and expands the context window to 1M tokens.
The notable part is not only the top-line ranking. Artificial Analysis places GLM-5.2 on the Intelligence versus Cost per Task Pareto frontier, meaning it is priced favorably for its measured capability. The same report also shows a tradeoff: GLM-5.2 uses about 43k output tokens per Intelligence Index task, more than several open weights peers. That turns the story from a leaderboard win into a practical question about reasoning budget.
The Hacker News thread focused on that tension. One commenter described a small Nim math-evaluator task where GLM-5.2 spent more than 15 minutes and roughly 45k tokens reasoning before creating the first file. Another argued that the High setting may be the more useful default because it can reduce token use sharply with limited quality loss for many tasks. The strongest community signal was not skepticism about the model’s intelligence; it was concern about whether slow, long reasoning is acceptable in everyday agent workflows.
The benchmark details explain why the model attracted attention. GLM-5.2 leads open weights models on GDPval-AA v2, improves across scientific reasoning and TerminalBench, and is available through Z.ai’s API as well as third-party providers. But adoption will depend on more than availability. Users will test multimodal gaps, provider limits, latency, and whether the model can spend fewer tokens while keeping its edge. Open weights competition is now entering a more demanding phase: capability, cost, and waiting time have to improve together.
Source: Artificial Analysis, community discussion on Hacker News.
Related Articles
MiniMax has moved M3 from model teaser to open-weight distribution. The Hugging Face card lists about 428B total parameters, 23B activated parameters, and a 1M-token context window.
The r/MachineLearning thread captured a practical benchmark problem: closed models dominate eval tables even when their results are not reproducible in the old Papers with Code sense.
OpenRouter says Fusion reached within 1% of Claude Fable 5 on 100 DRACO deep-research tasks while costing roughly half as much. The product shifts the contest from one frontier model to a server-side panel, judge, and synthesizer workflow.