Zhipu AI's GLM-5 has claimed the top spot among open-weights models on the Extended NYT Connections benchmark with a score of 81.8, edging out Kimi K2.5 Thinking's 78.3.
#llm
RSS FeedGuide Labs has released Steerling-8B, the first inherently interpretable language model that traces every generated token back to its input context, human-understandable concepts, and training data sources.
Stephen Wolfram announces that Wolfram Language and Alpha will be formally available as a 'foundation tool' for any LLM, combining language models' natural language ability with Wolfram's precise computational knowledge.
OpenAI CEO Sam Altman has set a new date for AGI, claiming that by the end of 2028, most of humanity's intellectual capacity could reside inside data centers rather than human minds.
Anthropic has accused three Chinese AI companies — DeepSeek, Moonshot AI (Kimi), and MiniMax — of creating over 24,000 fraudulent Claude accounts to extract training data from 16 million conversations, marking a major escalation in AI intellectual property disputes.
Anthropic has accused Chinese AI firms of creating over 24,000 fraudulent accounts to extract 16 million training exchanges from Claude for model distillation.
Startup Taalas proposes baking entire LLM weights and architecture into custom ASICs, claiming 17K+ tokens/second per user, sub-1ms latency, and 20x lower cost than cloud — all achievable within a 60-day chip production cycle.
DeepMind CEO Demis Hassabis proposed a concrete AGI benchmark: train an AI with a knowledge cutoff of 1911, then see if it can independently derive general relativity as Einstein did in 1915. This test targets genuine scientific discovery rather than pattern matching.
Alibaba launched Qwen3.5, a 397B-parameter open-weight multimodal model supporting 201 languages. The company claims it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 on benchmarks, while costing 60% less than its predecessor.
A user created a fully playable space exploration game using only natural language instructions to Gemini 3.1 Pro over a few hours. The AI handled performance optimization, soundtrack generation, and UI design entirely from plain language requests, producing around 1,800 lines of HTML code.
Anthropic's Claude Sonnet 4.6, released February 17, delivers Opus 4.5-level performance at Sonnet pricing with a 1M-token context window in beta, and becomes the new default for Free and Pro users.
Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.