LLM

LLM X/Twitter 3d ago 2 min read

OpenAI open-sources Symphony after a 500% PR jump on some teams

This matters because the next bottleneck in agent coding is human attention, not raw model speed. OpenAI says Symphony lifted landed pull requests by 500% on some teams after engineers hit a practical ceiling of roughly three to five concurrent Codex sessions.

#openai #codex #agents

LLM Reddit 3d ago 2 min read

LocalLLaMA lights up over Hipfire as AMD finally gets its own inference speed story

LocalLLaMA upvoted Hipfire because it felt like overdue attention for RDNA users, not just another repo drop. The thread filled with early tests showing multi-fold decode gains and immediate questions about quant formats and compatibility.

#amd #rdna #inference

LLM Hacker News 3d ago 2 min read

HN notices what made Dirac top TerminalBench: fewer tokens, sharper edits

HN did not just react to a leaderboard bump. The thread locked onto Dirac's claim that tighter context, hash-anchored edits, and AST-guided retrieval can beat heavier coding agents while spending less.

#coding-agents #terminalbench #gemini

LLM 3d ago 2 min read

Google’s April Gemini push: Mac app, Notebooks, global Personal Intelligence

Google’s April 24 Gemini Drop is less about one flashy model and more about daily lock-in. A native Mac app, Notebooks integration, global Personal Intelligence, free 3-minute Lyria 3 Pro tracks and interactive visuals push Gemini closer to an always-on assistant.

#google #gemini #personal-intelligence

LLM Reddit 3d ago 2 min read

LocalLLaMA Reads Anthropic’s Claude Postmortem as a Warning About Hosted Control

LocalLLaMA seized on Anthropic’s postmortem as confirmation of a fear the subreddit repeats constantly: when the model is hosted, the person paying for it may not control what “the same model” means from week to week.

#anthropic #claude-code #self-hosting

LLM Reddit 3d ago 2 min read

LocalLLaMA Calls SWE-bench Verified “Benchmaxxed” as Benchmark Trust Cracks

LocalLLaMA’s reaction was almost resigned: of course the public benchmark got benchmaxxed. What mattered was seeing contamination and flawed tests laid out in numbers big enough that the old bragging rights no longer looked stable.

#swe-bench #benchmarks #contamination

LLM Reddit 3d ago 2 min read

LocalLLaMA Turns on a Star Uncensored Model Maker After a Heretic Plagiarism Breakdown

LocalLLaMA did not treat this like routine subreddit drama. The thread exploded because a popular uncensored-model maker’s claimed private method suddenly looked less like secret sauce and more like stripped-attribution reuse of Heretic.

#abliteration #agpl #open-source

LLM Hacker News 3d ago 2 min read

HN Zeroes In on Permissions and Backups After an AI Agent Deletes a Production Database

Hacker News was less fascinated by the agent’s “confession” than by the missing basics around it: a production volume deletable from a staging task, backups in the same blast radius, and a broadly scoped token sitting where an agent could grab it.

#ai-agents #cursor #railway

LLM 3d ago 1 min read

GitHub moves agent mode into JetBrains and adds global auto-approve

GitHub is pushing Copilot's agent workflow directly into JetBrains editors, not just the side chat panel, and pairing it with inline previews for Next Edit Suggestions. The bigger governance change is global auto-approve: one switch can approve file edits, terminal commands, and external tool calls across workspaces.

#github #copilot #jetbrains

LLM 3d ago 2 min read

GitHub Copilot gets GPT-5.5, but the 7.5x multiplier changes the math

GitHub is rolling GPT-5.5 into Copilot across IDEs, CLI, mobile, github.com, and the cloud agent, turning OpenAI's latest model into a daily coding option instead of a release-note headline. The catch is a 7.5x premium request multiplier, and Business or Enterprise admins must explicitly enable access.

#github #copilot #gpt-5-5

LLM Reddit 3d ago 2 min read

Qwen3.6 27B Hits 100 tps on One RTX 5090, and LocalLLaMA Immediately Asks About Quality

LocalLLaMA was interested for a reason beyond a flashy speed number. A post claiming 105-108 tps and a full 256k native context window for Qwen3.6-27B-INT4 on a single RTX 5090 turned the thread into a practical discussion about how much quality survives once local inference gets this fast.

#qwen #vllm #rtx-5090

LLM Hacker News 3d ago 2 min read

HN Turns on SWE-bench Verified as Contamination Overtakes the Score

HN piled in because this was bigger than another benchmark refresh. OpenAI said SWE-bench Verified is no longer a trustworthy frontier coding signal, and the thread immediately shifted to contamination, saturated leaderboards, and whether public coding evals can stay clean at all.

#swe-bench #evals #coding-agents