GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark

Original: GLM-5 is the new top open-weights model on the Extended NYT Connections benchmark, with a score of 81.8, edging out Kimi K2.5 Thinking (78.3) View original →

Read in other languages: 한국어日本語
LLM Feb 24, 2026 By Insights AI (Reddit) 1 min read 3 views Source

GLM-5 Takes the Lead

Zhipu AI's GLM-5 has achieved a score of 81.8 on the Extended NYT Connections benchmark, making it the new top-performing open-weights language model on this evaluation. It edges out the previous leader Kimi K2.5 Thinking, which scored 78.3 — a meaningful 3.5-point gap.

What the NYT Connections Benchmark Tests

The Extended NYT Connections benchmark is based on The New York Times' word association puzzle game, adapted for LLM evaluation. Players (or models) must sort 16 words into 4 hidden categories. What makes this benchmark challenging for LLMs is that it requires genuine conceptual reasoning beyond statistical pattern matching — understanding polysemy, cultural references, lateral thinking, and semantic groupings that aren't immediately obvious.

Unlike standard benchmarks that can be gamed by memorization, NYT Connections tests flexible, contextual intelligence. A model that does well here is demonstrating something closer to genuine language understanding. The full benchmark results are available at github.com/lechmazur/nyt-connections.

Chinese Open-Source AI's Rising Tide

Zhipu AI is a Beijing-based AI startup with strong ties to Tsinghua University, known for its General Language Model (GLM) series. GLM-5's achievement highlights the rapid progress of Chinese open-source AI — particularly notable given that its main competition (Kimi K2.5 Thinking from Moonshot AI) is also a Chinese startup.

Open-Weights Competition Intensifies

This result signals that Chinese models are increasingly competitive in the open-weights space, challenging Western counterparts like Meta's Llama series and Mistral. GLM-5's score of 81.8 also compares favorably to many proprietary models, suggesting the gap between open and closed models continues to narrow at a rapid pace.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.