A local LLM researcher achieved 95.7% on SimpleQA using Qwen3.6-27B with agentic search on a single consumer GPU.
LLM
RSS FeedThe latest ARC-AGI-3 scores show GPT-5.5 High at 0.43% and Claude Opus 4.7 at 0.18% — the most powerful models today remain effectively at zero on this AGI benchmark.
The technique GPT-5.4 Pro used to solve Erdos Problem 1196 has been applied to other problems, including another conjecture unsolved for 60 years.
AWS customers can now access OpenAI's GPT models and Codex coding agent through Amazon Bedrock, marking OpenAI's first major deployment outside Microsoft Azure. General availability is expected within weeks.
DeepSeek released DeepSeek-V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B total, 13B active), both Mixture-of-Experts models with MIT license and 1M token context. V4-Pro is the largest open-weights model released so far, and its pricing at $1.74/M input undercuts GPT-5.4 and Claude Sonnet 4.6 by more than half.
Open-source PFlash uses speculative prefill to dramatically cut time-to-first-token for long-context LLM inference, achieving 10.4x speedup on Qwen3.6-27B Q4_K_M with a consumer RTX 3090.
A LocalLLaMA community member completed a 16-node DGX Spark cluster with 200 Gbps networking, optimized for unified-memory LLM inference and planning tests with DeepSeek and Kimi models.
OpenAI has released Symphony, an open-source specification that turns issue trackers like Linear into a control plane for autonomous coding agents. The system assigns a Codex agent per task, handles CI, rebasing, and PR management without human oversight.
LocalLLaMA treated this less as a speed chart and more as a question about completion quality under a messy real prompt. On the same MacBook Pro M5 Max, Qwen 3.6 27B wrote more and faster, but Gemma 4 31B finished the game logic with far fewer tokens.
Why it matters: leaderboard gains are more meaningful when they arrive with a cheaper training bill. Baidu says ERNIE 5.1 Preview ranks #13 globally and #1 among Chinese labs on LMArena Text while using about 6% of the pretraining cost of comparable models.
LocalLLaMA cared less about headline speed than a Qwen3.6 setup on one RTX 3090 that reached 218K context and stopped crashing on long tool outputs.
LocalLLaMA reacted hard because DeepSeek's visual-primitives idea makes points and boxes part of reasoning itself, and the repo going private only made the thread hotter.