#reasoning

LLM sources.news Mar 9, 2026 2 min read

OpenAI says 5 of 10 First Proof attempts may be correct after expert review

OpenAI released proof attempts for all 10 First Proof problems and said expert feedback suggests at least five may be correct. The company positioned the result as a test of long-horizon reasoning beyond standard benchmarks.

#openai #reasoning #math

LLM Hacker News Mar 7, 2026 2 min read

HN Spotlight: Sarvam Open-Sources 30B and 105B in a Full-Stack IndiaAI Push

A well-received HN post highlighted Sarvam AI’s decision to open-source Sarvam 30B and 105B, two reasoning-focused MoE models trained in India under the IndiaAI mission. The announcement matters because it pairs open weights with concrete product deployment, inference optimization, and unusually strong Indian-language benchmarks.

#sarvam #open-source #llm

LLM sources.twitter Mar 5, 2026 1 min read

Google AI Developers Announces Gemini 3.1 Flash-Lite Preview

Google AI Developers announced that Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API and Google AI Studio. The post positions it as the fastest and most cost-efficient model in the Gemini 3 line, now adding dynamic thinking for task-adaptive reasoning.

#gemini #google #api

LLM Hacker News Mar 3, 2026 1 min read

Claude Opus 4.6 Solves Don Knuth's Open Math Problem

Anthropic's Claude Opus 4.6 independently solved a directed Hamiltonian cycle decomposition problem that computer science legend Donald Knuth had spent weeks working on. Knuth documented the achievement in a formal Stanford paper, marking one of the first times a top-tier computer scientist has formally credited an LLM with solving a genuine research problem.

#claude #knuth #mathematics

AI Reddit Mar 3, 2026 1 min read

Scientists Made AI Agents Ruder — And They Performed Better at Complex Reasoning Tasks

A counterintuitive study found that programming AI agents with more assertive, 'rude' conversational behaviors — including interrupting and strategic silence — significantly improved their performance on complex reasoning tasks.

#ai-agents #reasoning #research

LLM Feb 28, 2026 2 min read

Google DeepMind Launches Gemini 3.1 Pro for Complex Reasoning Workloads

Google DeepMind announced Gemini 3.1 Pro on February 19, 2026 as an upgraded core model for harder tasks. The company highlighted a verified 77.1% score on ARC-AGI-2 and broad rollout across developer, enterprise, and consumer surfaces.

#gemini #google-deepmind #llm

LLM Hacker News Feb 24, 2026 1 min read

The "Car Wash" Test: Only 11 of 53 AI Models Pass a Simple Logic Question

Opper tested 53 leading LLMs with a deceptively simple logic question about whether to walk or drive to a car wash 50 meters away. Only 11 models answered correctly — the car must be driven to the car wash.

#llm #benchmark #reasoning

LLM Hacker News Feb 24, 2026 1 min read

The "Car Wash" Test: Only 11 of 53 AI Models Pass a Simple Logic Question

#llm #benchmark #reasoning

LLM Feb 23, 2026 1 min read

Google Releases Gemini 3.1 Pro with 77.1% on ARC-AGI-2

Google's Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2—more than doubling the previous Gemini 3 Pro's score. The mid-cycle upgrade brings Deep Think-level reasoning capabilities to all users and developers.

#google #gemini #benchmark

LLM sources.twitter Feb 22, 2026 1 min read

Google DeepMind Releases Gemini 3.1 Pro: 2x Reasoning Boost and Record Benchmark Scores

Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.

#gemini #google-deepmind #llm

LLM Feb 21, 2026 2 min read

Google Launches Gemini 3.1 Pro With ARC-AGI-2 Score of 77.1% and Broad Rollout Paths

On 2026-02-19, Google announced Gemini 3.1 Pro and began rolling it out across developer, enterprise, and consumer surfaces. The post reports a verified ARC-AGI-2 score of 77.1% and lists immediate access via Gemini API, Gemini CLI, Vertex AI, Gemini app, and NotebookLM.

#gemini-3-1-pro #google-deepmind #reasoning

LLM Feb 20, 2026 2 min read

OpenAI publishes First Proof model submissions

OpenAI published five model-generated submissions to the First Proof math challenge. None were accepted as valid solutions, but the release gives researchers direct evidence of where frontier reasoning systems succeed and fail.

#openai #reasoning #math