OpenAI released proof attempts for all 10 First Proof problems and said expert feedback suggests at least five may be correct. The company positioned the result as a test of long-horizon reasoning beyond standard benchmarks.
#reasoning
RSS FeedA well-received HN post highlighted Sarvam AI’s decision to open-source Sarvam 30B and 105B, two reasoning-focused MoE models trained in India under the IndiaAI mission. The announcement matters because it pairs open weights with concrete product deployment, inference optimization, and unusually strong Indian-language benchmarks.
Google AI Developers announced that Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API and Google AI Studio. The post positions it as the fastest and most cost-efficient model in the Gemini 3 line, now adding dynamic thinking for task-adaptive reasoning.
Anthropic's Claude Opus 4.6 independently solved a directed Hamiltonian cycle decomposition problem that computer science legend Donald Knuth had spent weeks working on. Knuth documented the achievement in a formal Stanford paper, marking one of the first times a top-tier computer scientist has formally credited an LLM with solving a genuine research problem.
A counterintuitive study found that programming AI agents with more assertive, 'rude' conversational behaviors — including interrupting and strategic silence — significantly improved their performance on complex reasoning tasks.
Google DeepMind announced Gemini 3.1 Pro on February 19, 2026 as an upgraded core model for harder tasks. The company highlighted a verified 77.1% score on ARC-AGI-2 and broad rollout across developer, enterprise, and consumer surfaces.
Opper tested 53 leading LLMs with a deceptively simple logic question about whether to walk or drive to a car wash 50 meters away. Only 11 models answered correctly — the car must be driven to the car wash.
Opper tested 53 leading LLMs with a deceptively simple logic question about whether to walk or drive to a car wash 50 meters away. Only 11 models answered correctly — the car must be driven to the car wash.
Google's Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2—more than doubling the previous Gemini 3 Pro's score. The mid-cycle upgrade brings Deep Think-level reasoning capabilities to all users and developers.
Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.
On 2026-02-19, Google announced Gemini 3.1 Pro and began rolling it out across developer, enterprise, and consumer surfaces. The post reports a verified ARC-AGI-2 score of 77.1% and lists immediate access via Gemini API, Gemini CLI, Vertex AI, Gemini app, and NotebookLM.
OpenAI published five model-generated submissions to the First Proof math challenge. None were accepted as valid solutions, but the release gives researchers direct evidence of where frontier reasoning systems succeed and fail.