Users on r/LocalLLaMA have spotted Qwen3.5 model names appearing in Alibaba's official Qwen chat interface, signaling an imminent release of the next generation of Alibaba's open-source LLM series.
LLM
RSS FeedOpper tested 53 leading LLMs with a deceptively simple logic question about whether to walk or drive to a car wash 50 meters away. Only 11 models answered correctly — the car must be driven to the car wash.
Opper tested 53 leading LLMs with a deceptively simple logic question about whether to walk or drive to a car wash 50 meters away. Only 11 models answered correctly — the car must be driven to the car wash.
Claude Sonnet 4.6 achieves 72.5% on OSWorld—just 0.2 points below Opus 4.6—with a 1M-token context window in beta. At $3/$15 per million tokens, it brings flagship-class agentic capabilities to a mid-tier price point.
Zhipu AI's GLM-5 has claimed the top spot among open-weights models on the Extended NYT Connections benchmark with a score of 81.8, edging out Kimi K2.5 Thinking's 78.3.
Guide Labs has released Steerling-8B, the first inherently interpretable language model that traces every generated token back to its input context, human-understandable concepts, and training data sources.
Stephen Wolfram announces that Wolfram Language and Alpha will be formally available as a 'foundation tool' for any LLM, combining language models' natural language ability with Wolfram's precise computational knowledge.
Google DeepMind released Gemini 3.1 Pro on February 19, achieving 77.1% on ARC-AGI-2—more than double its predecessor's 31.1%—with a 1M-token context window and 80.6% on SWE-Bench Verified.
Anthropic has accused Chinese AI firms of creating over 24,000 fraudulent accounts to extract 16 million training exchanges from Claude for model distillation.
Anthropic launched Claude in PowerPoint, a Microsoft 365 add-in that generates and edits slides from natural language prompts while respecting existing themes. Available as a research preview for Pro, Max, Team, and Enterprise subscribers.
DeepSeek released V4 on Lunar New Year with 1 trillion parameters, 1M-token context windows, and novel mHC architecture. The open-weight model claims benchmark-topping coding performance at 10–40× lower inference costs than Western frontier models.
Qwen3's TTS model encodes voices into 1024-dimensional vectors, enabling gender swapping, pitch adjustment, voice mixing, and semantic voice search through vector math — now available as a standalone lightweight encoder on HuggingFace.