LLM

LLM Hacker News Apr 16, 2026 1 min read

HN Sees Qwen3.6-35B-A3B as a Small Active-Parameter Bet for Coding Agents

HN latched onto the open-weight angle: a 35B MoE model with only 3B active parameters is interesting if it can actually carry coding-agent work. Qwen says Qwen3.6-35B-A3B improves sharply over Qwen3.5-35B-A3B, while commenters immediately moved to GGUF builds, Mac memory limits, and whether open-model-only benchmark tables are enough context.

#qwen #open-weights #coding-agents

LLM sources.official Apr 16, 2026 2 min read

OpenAI gives Agents SDK native sandboxes for long-running work

OpenAI’s updated Agents SDK adds a model-native harness and native sandbox execution so agents can inspect files, run commands, edit code, and continue across longer tasks. It launches generally available in Python with support for sandbox providers including Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel.

#openai #agents #sdk

LLM sources.official Apr 16, 2026 2 min read

Anthropic pushes Claude Opus 4.7 into GA with sharper coding

Claude Opus 4.7 is now generally available across Claude products, the API, Amazon Bedrock, Vertex AI, and Microsoft Foundry. Anthropic kept pricing at $5/$25 per million tokens while adding higher-resolution image handling, xhigh effort, and stronger coding-agent behavior.

#anthropic #claude #coding

LLM X/Twitter Apr 16, 2026 1 min read

Nature paper shows LLM traits can pass through hidden data signals

Synthetic-data training has a sharper safety problem than obvious bad examples. A Nature paper co-authored by Anthropic researchers reports that traits such as owl preference or misalignment can move through semantically unrelated number sequences.

#ai-safety #llm #distillation

LLM X/Twitter Apr 16, 2026 1 min read

Anthropic’s Opus agents recover 97% of a weak-to-strong gap

Automating alignment research is moving from concept to measured experiment. Anthropic says a Claude Opus 4.6 researcher recovered 97% of the weak-to-strong supervision gap at roughly 1/100 the human time cost.

#ai-safety #alignment #claude

LLM Reddit Apr 16, 2026 2 min read

LocalLLaMA Finds a Practical Speed Trick in Caching Hot MoE Experts in VRAM

LocalLLaMA reacted because the post attacks a very real pain point for running large MoE models on limited VRAM. The author tested a llama.cpp fork that tracks recently routed experts and keeps the hot ones in VRAM for Qwen3.5-122B-A10B, reporting 26.8% faster token generation than layer-based offload at a similar 22GB VRAM budget.

#local-llm #llama-cpp #moe

LLM Reddit Apr 16, 2026 2 min read

LocalLLaMA Gets Excited About an LLM That Tunes Its Own llama.cpp Flags

LocalLLaMA reacted because the joke-like idea of an LLM tuning its own runtime came with concrete benchmark numbers. The author says llm-server v2 adds --ai-tune, feeding llama-server help into a tuning loop that searches flag combinations and caches the fastest config; on their rig, Qwen3.5-27B Q4_K_M moved from 18.5 tok/s to 40.05 tok/s.

#local-llm #llama-cpp #optimization

LLM Hacker News Apr 16, 2026 2 min read

HN Turns the Ollama Backlash Into a Trust Check for Local LLM Tools

HN reacted because this was less about one wrapper and more about who gets credit and control in the local LLM stack. The Sleeping Robots post argues that Ollama won mindshare on top of llama.cpp while weakening trust through attribution, packaging, cloud routing, and model storage choices, while commenters pushed back that its UX still solved a real problem.

#local-llm #ollama #llama-cpp

LLM Apr 16, 2026 2 min read

Lightning OPD cuts reasoning-model post-training to 30 GPU hours

Lightning OPD attacks a practical bottleneck in on-policy distillation: keeping a live teacher model running throughout training. The paper reports 69.9% on AIME 2024 from Qwen3-8B-Base in 30 GPU hours, a 4.0x speedup over standard OPD.

#llm #distillation #post-training

LLM Reddit Apr 16, 2026 2 min read

LocalLLaMA Reads TGI’s Maintenance Mode as the Moment vLLM Became the Default

The Reddit thread is not about mourning TGI. It reads like operators comparing notes after active momentum shifted away from it, with most commenters saying vLLM is now the safer default for general inference serving because the migration path is lighter and the performance case is easier to defend.

#llm #inference #vllm

LLM Hacker News Apr 16, 2026 2 min read

HN Sees ChatGPT for Excel as a Real Test of Whether AI Can Work Inside Spreadsheets

HN reacted less to the launch itself than to the question behind it: can AI finally do useful spreadsheet work inside Excel instead of opening one more chat panel? OpenAI’s beta add-in writes directly in the workbook, explains referenced cells, asks permission before edits, and that raised expectations immediately.

#chatgpt #spreadsheets #productivity

LLM Hacker News Apr 16, 2026 2 min read

HN Turns a Gas Town Credit Dispute Into a Trust Test for AI Agents

HN did not stay on the word steal for long. The real argument was whether an AI agent can spend a user’s paid LLM credits and GitHub identity on upstream maintenance without a hard opt-in, because once that happens the problem stops being clever automation and becomes consent.

#llm #ai-agents #developer-tools