ARC Prize put Anthropic Opus 4.8 at the top of ARC-AGI-3, but the score shows how hard the benchmark remains. The new mark is 1.5% at roughly $10K, with progress tied to object-and-system abstraction rather than image-level pattern matching.
ARC Prize put Anthropic Opus 4.8 at the top of ARC-AGI-3, but the score shows how hard the benchmark remains. The new mark is 1.5% at roughly $10K, with progress tied to object-and-system abstraction rather than image-level pattern matching.
Microsoft AI made its in-house model strategy more concrete with seven MAI models across reasoning, coding, image, voice, and transcription. The headline numbers are 35B active parameters, a 256K context window, 97% on AIME 2025, and 53% on SWE Bench Pro.
OpenAI says a general-purpose reasoning model found a construction disproving the conjectured upper bound in Erdős's planar unit-distance problem. Mathematicians reviewed the proof, but the ML community raises questions about methodological transparency.
LocalLLaMA did not treat this as shower-thought material. The thread turned into a real argument about why today’s LLMs keep reasoning legible in language instead of hiding it in latent vectors.
HN did not greet GPT-5.5 with applause first. The thread went straight to pricing, context tiers, and whether the model actually behaves better once real coding work starts.
Why it matters: this is one of the first external benchmark reads to land right after the GPT-5.5 launch. Artificial Analysis said GPT-5.5 moved 3 points clear on its Intelligence Index, while the full index run still became roughly 20% more expensive.
AI at Meta said on April 8, 2026 that Muse Spark is a natively multimodal reasoning model with tool use, visual chain of thought, and multi-agent orchestration. Meta's official announcement says it already powers the Meta AI app and meta.ai, is rolling out across WhatsApp, Instagram, Facebook, Messenger and AI glasses, and is entering private-preview API access for selected partners.
A Hacker News thread amplified Meta's launch of Muse Spark, a multimodal reasoning model with tool use, visual chain of thought, and a parallel-agent Contemplating mode.
Right after ARC Prize released ARC-AGI 3, r/singularity focused on the benchmark’s shift toward interactive environments and action-efficient scoring. The core message is that frontier AI still lags badly when it must generalize, explore, and plan under tight interaction budgets.
Mistral announced Mistral Small 4 on March 16, 2026 as a single open model that combines reasoning, multimodal input, and agentic coding. Key specs include 119B total parameters, 6B active parameters per token, a 256k context window, Apache 2.0 licensing, and configurable reasoning effort.
ARC Prize introduced ARC-AGI-3 on March 24, 2026 as a benchmark for frontier agentic intelligence in novel environments. On Hacker News it reached 238 points and 163 comments, signaling strong interest in evaluation methods that go beyond static tasks.
Microsoft Research announced the 15 billion parameter open-weight model Phi-4-reasoning-vision-15B on March 4, 2026. The lab says the release is designed to deliver stronger multimodal reasoning, math and science performance, and computer-use ability without the compute profile of much larger systems.