Skip to content

GLM 5.2 tops Claude Code in Semgrep security benchmark

Original: GLM 5.2 beats Claude in our benchmarks View original →

Read in other languages: 한국어日本語
LLM Jun 30, 2026 By Insights AI (HN) 1 min read 1 views Source

Semgrep’s latest security benchmark puts Zhipu AI’s GLM 5.2 ahead of Claude Code on IDOR detection. Under the same dataset and prompt-only setup, GLM 5.2 reached 39% F1, while Claude Code scored 32%. Semgrep also estimated the GLM 5.2 run at roughly $0.17 per vulnerability found.

The result is not a claim that open models have solved application security. Semgrep’s own multimodal pipeline still scored higher at 53-61% F1. That gap matters because the pipeline is not just a raw model call; it combines model reasoning with static-analysis signals, rules, and a security-specific workflow.

What makes the post interesting is where the frontier moved. Security bug discovery has been a difficult area for smaller or open-weight models because it needs repository context, control-flow reasoning, and enough restraint to avoid false positives. GLM 5.2 doing well in a prompt-only setting gives teams a reason to test open models for internal code review and triage work, especially where data control and inference cost matter.

The HN discussion quickly shifted from the leaderboard to deployment reality. Some commenters described GLM 5.2 as a useful daily coding model; others asked what hardware can realistically serve a model of this size. That tension is the story: GLM 5.2 did not replace a purpose-built security system, but it did make the open-weight option harder to dismiss.

Share: Long

Related Articles