Z.ai Releases GLM-5: 744B Parameter Open-Source Powerhouse
Original: GLM-5 Officially Released View original →
Technical Specifications
GLM-5 represents a significant scaling increase from its predecessor. The model grows from 355B parameters (32B active) to 744B parameters (40B active), with pre-training data expanding from 23T to 28.5T tokens. A notable architectural addition is the integration of DeepSeek Sparse Attention (DSA), which reportedly "reduces deployment cost while preserving long-context capacity."
Performance Highlights
The model demonstrates strong capabilities across multiple evaluation frameworks:
- Academic Benchmarks: Achieves "best-in-class performance among all open-source models" in reasoning, coding, and agentic tasks
- Real-World Tasks: On CC-Bench-V2, GLM-5 significantly outperforms GLM-4.7 in frontend, backend, and long-horizon tasks
- Long-Horizon Planning: Ranks #1 among open-source models on Vending Bench 2, completing a simulated year-long business scenario with a final balance of $4,432
What Makes It Significant
Beyond scaling, GLM-5 introduces slime, described as "an asynchronous RL infrastructure that substantially improves training throughput and efficiency." This addresses a critical challenge: deploying reinforcement learning at scale for large language models.
The model is purpose-built for "complex systems engineering and long-horizon agentic tasks," positioning it as a bridge between traditional language models and autonomous agent capabilities.
Background
The release received significant attention, scoring 730 points on Reddit's LocalLLaMA and 289 points on singularity. Z.ai openly acknowledged GPU scarcity, stating "compute is very tight."
Related Articles
Open-model competition is shifting from leaderboard scores to agent operating costs. NVIDIA says Nemotron 3 Ultra is a 550B MoE model with 5x faster inference and up to 30% lower cost for complex agentic tasks.
The draw for LocalLLaMA was not just another coding model, but Cohere asking the local-inference crowd to test pre-release weights first.
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.