LocalLLaMA reacted because the post was not just another “new model feels strong” claim. The author said Qwen 3.6 handled workloads normally reserved for Opus and Codex on an M5 Max 128GB setup, but the practical hook was the warning to enable preserve_thinking.
LLM
RSS FeedHN upvoted this because it turned vague limit anxiety into numbers. Tokenomics says 541 anonymous submissions averaged 466 request tokens on Opus 4.7 versus 349 on Opus 4.6, a 38.1% increase, and the thread immediately argued over what that means for real Claude usage.
LocalLLaMA cared about this eval post because it mixed leaderboard data with lived coding-agent pain: Opus 4.7 scored well, but the author says it felt worse in real use.
A r/LocalLLaMA thread turned one user’s failed local tool-calling setup into a practical checklist: OpenWebUI, native tool calls, quants, runtimes and wrappers all matter.
A new arXiv preprint reports that LLM judges became meaningfully more lenient when prompts framed evaluation consequences, exposing a weak point in automated safety and quality benchmarks.
r/LocalLLaMA cared because the numbers were concrete: 79 t/s on an RTX 5070 Ti with 128K context, tied to one llama.cpp flag choice.
The thread was popular because it turned a naive-sounding question into a useful map of access control, logging, and career risk.
OpenAI says more than 3 million developers use Codex each week, and the desktop app is now moving beyond code edits. The update adds background computer use on macOS, an in-app browser, gpt-image-1.5 image generation, 90+ new plugins, PR review workflows, SSH devboxes in alpha, automations, and memory preview.
HN upvoted MacMind because it shrinks transformer mystique to something inspectable: 1,216 parameters in HyperTalk on a Macintosh SE/30. The demo learns bit-reversal for FFT using embeddings, positional encoding, self-attention, backpropagation and gradient descent.
r/LocalLLaMA upvoted this because ID checks turned the local-model argument from speed into autonomy. Anthropic says Claude identity verification can require a government photo ID and a live selfie through Persona.
MM-WebAgent tackles a real flaw in AI-made webpages: models can generate pieces, but the page often loses visual coherence. The paper adds hierarchical planning, self-reflection, a benchmark, and released code/data so builders can test multimodal webpage agents beyond code-only output.
The r/singularity thread did not just react to Opus 4.7 scoring 41.0% where Opus 4.6 scored 94.7%. The interesting part was the community trying to separate real capability loss from refusal behavior, routing, and benchmark interpretation.