LLM

LLM Hacker News 5d ago 2 min read

HN Fixates on WUPHF's LLM Wiki: Shared Memory Is Easy, Trust Is the Hard Part

HN did not treat WUPHF as just another multi-agent toy. What grabbed attention was the notebook-to-wiki promotion flow: agents keep private notes, then graduate durable facts into a shared markdown-and-git memory.

#agents #agent-memory #markdown

LLM Hacker News 5d ago 2 min read

HN Reacts to Browser Harness: Let the Agent Rewrite Its Browser Tools Mid-Task

HN did not push Browser Harness because it was another browser wrapper. It took off because the repo lets an LLM patch its own browser helpers in the middle of a task, trading safety rails for raw flexibility.

#browser-automation #web-agents #cdp

LLM 5d ago 2 min read

Anthropic stress-tests Claude for elections, hits 100% and 99.8%

Anthropic put hard numbers behind Claude’s election safeguards. Opus 4.7 and Sonnet 4.6 responded appropriately 100% and 99.8% of the time in a 600-prompt election-policy test, and triggered web search 92% and 95% of the time on U.S. midterm-related queries.

#anthropic #claude #elections

LLM 5d ago 2 min read

Google turns Cloud Next into an agent-platform pitch at 16B TPM

Google says its AI business has crossed from pilots to operations: 75% of Cloud customers now use AI products, 330 customers processed more than 1 trillion tokens each in the past year, and model traffic exceeds 16 billion tokens per minute. The company used Cloud Next ’26 to turn that scale into a product pitch for Gemini Enterprise Agent Platform, a full runtime and governance layer for enterprise agents.

#google-cloud #gemini #agents

LLM X/Twitter 5d ago 2 min read

xAI ships Grok Voice Think Fast 1.0 with τ-voice lead

xAI is turning voice agents into production software, not a demo. Grok Voice Think Fast 1.0 tops τ-voice Bench, supports 25+ languages, and xAI says the same stack is driving a 20% sales conversion and 70% support resolution flow at Starlink.

#xai #grok-voice #voice-agents

LLM X/Twitter 5d ago 2 min read

OpenAI puts GPT-5.5 live with 82.7% Terminal-Bench gains

OpenAI is pushing harder into agentic work, not just chat. On the company's own evals, GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, beats GPT-5.4 by 7.6 points, and uses fewer tokens in Codex.

#openai #gpt-5-5 #codex

LLM Reddit 5d ago 2 min read

LocalLLaMA Jumps on a KV-Cache Benchmark That Breaks the "q8_0 Is Basically Free" Myth

LocalLLaMA reacted because the post did not just tweak a benchmark table. It went after a widely repeated local-inference assumption and showed that the answer changes sharply by model family, especially for Gemma. By crawl time on April 25, 2026, the thread had 324 points and 58 comments.

#kv-cache #gemma #qwen

LLM Reddit 5d ago 2 min read

LocalLLaMA Treats Qwen 3.6 27B as a Dense-Model Moment, Not Just Another Release

LocalLLaMA reacted like dense models had suddenly become fun again. The official Qwen numbers were strong, but the real community energy came from people immediately asking about quants, GGUF builds, and whether 27B had become the practical sweet spot. By crawl time on April 25, 2026, the thread had 1,688 points and 603 comments.

#qwen #open-weights #coding-models

LLM Hacker News 5d ago 3 min read

HN Reads Zed's Parallel Agents Launch as a Bet on Worktrees, Not Just More AI Panels

Hacker News liked that Zed did more than add extra agents to a sidebar. The thread focused on worktree isolation, repo scoping, and whether Zed found a more usable shape for multi-agent coding than the usual terminal pile-up. By crawl time on April 25, 2026, the post had 278 points and 160 comments.

#zed #coding-agents #worktrees

LLM 5d ago 2 min read

DeepMind's Decoupled DiLoCo chases zero-downtime LLM training

DeepMind is aiming at a stubborn systems problem: one slow or broken learner can still stall an entire pretraining run. The paper claims competitive model quality with strictly zero global downtime in failure-prone simulations spanning millions of chips.

#google-deepmind #diloco #llm-training

LLM 5d ago 2 min read

GPT-5.5 pushes agentic coding higher without adding latency

OpenAI is pitching GPT-5.5 as more than a routine model refresh. With 82.7% on Terminal-Bench 2.0, 58.6% on SWE-Bench Pro, and a claim that it keeps GPT-5.4-level latency, the company is resetting expectations for long-running coding agents.

#openai #gpt-5.5 #codex

LLM Reddit 5d ago 2 min read

LocalLLaMA Sees Qwen3.6 27B as the Small Open Model That Got Too Close for Comfort

LocalLLaMA upvoted this because a 27B open model suddenly looked competitive on agent-style work, not because everyone agreed on the benchmark. The thread stayed lively precisely because the result felt important and a little suspicious at the same time.

#qwen #open-weights #benchmarks