#reasoning

LLM News Jun 26, 2026 2 min read

Google shows LLM reasoning can retrieve facts, not just solve problems

Google Research separates two mechanisms behind reasoning-assisted factual recall in Gemini-2.5 and Qwen3-32B. Extra tokens provide computation time, related facts prime recall, and hallucinated intermediate facts sharply reduce final-answer accuracy.

#google-research #reasoning #hallucination

LLM Hacker News Jun 18, 2026 1 min read

GLM-5.2 pushes open weights into the cost-versus-reasoning debate

The community debate moved beyond rank: GLM-5.2 looks strong, but output-token hunger and latency now matter as much as benchmark position.

#glm #open-weights #benchmarks

LLM X/Twitter Jun 3, 2026 2 min read

Opus 4.8 reaches ARC-AGI-3 SOTA with 1.5% score and ~$10K run

ARC Prize put Anthropic Opus 4.8 at the top of ARC-AGI-3, but the score shows how hard the benchmark remains. The new mark is 1.5% at roughly $10K, with progress tied to object-and-system abstraction rather than image-level pattern matching.

#anthropic #opus-4-8 #arc-agi

AI X/Twitter Jun 3, 2026 2 min read

Microsoft MAI launches 7 models with 35B reasoning and 5B coding

Microsoft AI made its in-house model strategy more concrete with seven MAI models across reasoning, coding, image, voice, and transcription. The headline numbers are 35B active parameters, a 256K context window, 97% on AIME 2025, and 53% on SWE Bench Pro.

#microsoft #mai #coding

AI Reddit May 22, 2026 1 min read

OpenAI Claims AI Model Disproved Erdős's 50-Year-Old Unit-Distance Conjecture

OpenAI says a general-purpose reasoning model found a construction disproving the conjectured upper bound in Erdős's planar unit-distance problem. Mathematicians reviewed the proof, but the ML community raises questions about methodological transparency.

#openai #mathematics #reasoning

LLM Reddit Apr 30, 2026 2 min read

LocalLLaMA asks the obvious question: if LLMs think in vectors, why show words?

LocalLLaMA did not treat this as shower-thought material. The thread turned into a real argument about why today’s LLMs keep reasoning legible in language instead of hiding it in latent vectors.

#llm #reasoning #latent-space

LLM Hacker News Apr 26, 2026 2 min read

HN Meets GPT-5.5 API With a Price-and-Behavior Audit, Not a Victory Lap

HN did not greet GPT-5.5 with applause first. The thread went straight to pricing, context tiers, and whether the model actually behaves better once real coding work starts.

#openai #gpt-5-5 #api

LLM X/Twitter Apr 23, 2026 2 min read

GPT-5.5 jumps 3 points clear on Artificial Analysis, but cost rises 20%

Why it matters: this is one of the first external benchmark reads to land right after the GPT-5.5 launch. Artificial Analysis said GPT-5.5 moved 3 points clear on its Intelligence Index, while the full index run still became roughly 20% more expensive.

#gpt-5-5 #artificial-analysis #benchmarks

LLM X/Twitter Apr 12, 2026 2 min read

Meta launches Muse Spark as the first model from Meta Superintelligence Labs

AI at Meta said on April 8, 2026 that Muse Spark is a natively multimodal reasoning model with tool use, visual chain of thought, and multi-agent orchestration. Meta's official announcement says it already powers the Meta AI app and meta.ai, is rolling out across WhatsApp, Instagram, Facebook, Messenger and AI glasses, and is entering private-preview API access for selected partners.

#meta #muse-spark #multimodal

LLM Hacker News Apr 9, 2026 2 min read

Meta Debuts Muse Spark With Multimodal Reasoning and Parallel Agents

A Hacker News thread amplified Meta's launch of Muse Spark, a multimodal reasoning model with tool use, visual chain of thought, and a parallel-agent Contemplating mode.

#meta #muse-spark #multimodal

AI Reddit Mar 30, 2026 2 min read

r/singularity Zeroes In on ARC-AGI 3 and Action-Efficiency Scoring

Right after ARC Prize released ARC-AGI 3, r/singularity focused on the benchmark’s shift toward interactive environments and action-efficient scoring. The core message is that frontier AI still lags badly when it must generalize, explore, and plan under tight interaction budgets.

#arc-agi #benchmarks #reasoning

100

LLM Mar 29, 2026 1 min read

Mistral introduces Mistral Small 4, a unified open-source reasoning and multimodal model

Mistral announced Mistral Small 4 on March 16, 2026 as a single open model that combines reasoning, multimodal input, and agentic coding. Key specs include 119B total parameters, 6B active parameters per token, a 256k context window, Apache 2.0 licensing, and configurable reasoning effort.

#llm #multimodal #reasoning