GLM 5.2 tops Claude Code in Semgrep security benchmark

Semgrep’s latest security benchmark puts Zhipu AI’s GLM 5.2 ahead of Claude Code on IDOR detection. Under the same dataset and prompt-only setup, GLM 5.2 reached 39% F1, while Claude Code scored 32%. Semgrep also estimated the GLM 5.2 run at roughly $0.17 per vulnerability found.

The result is not a claim that open models have solved application security. Semgrep’s own multimodal pipeline still scored higher at 53-61% F1. That gap matters because the pipeline is not just a raw model call; it combines model reasoning with static-analysis signals, rules, and a security-specific workflow.

What makes the post interesting is where the frontier moved. Security bug discovery has been a difficult area for smaller or open-weight models because it needs repository context, control-flow reasoning, and enough restraint to avoid false positives. GLM 5.2 doing well in a prompt-only setting gives teams a reason to test open models for internal code review and triage work, especially where data control and inference cost matter.

The HN discussion quickly shifted from the leaderboard to deployment reality. Some commenters described GLM 5.2 as a useful daily coding model; others asked what hardware can realistically serve a model of this size. That tension is the story: GLM 5.2 did not replace a purpose-built security system, but it did make the open-weight option harder to dismiss.

GLM 5.2 tops Claude Code in Semgrep security benchmark

Related Articles

Snyk’s 300-run test exposes unstable LLM security-review queues

FrontierCode Asks Whether an AI Patch Would Actually Get Merged

A 2,000-person AI assistant attack test raises a harder question about responses

Related Articles

Snyk’s 300-run test exposes unstable LLM security-review queues

FrontierCode Asks Whether an AI Patch Would Actually Get Merged
LLM Hacker News Jun 10, 2026 1 min read

A 2,000-person AI assistant attack test raises a harder question about responses