LLM

LLM sources.twitter 15m ago 2 min read

Qwen's FlashQLA lifts linear attention speed 2-3x on Hopper

Why it matters: kernel work is what decides whether long-context and edge-side agent systems stay theoretical or become cheap enough to run. Qwen says FlashQLA delivers 2-3x forward speedup and 2x backward speedup over the FLA Triton kernel on NVIDIA Hopper.

#qwen #linear-attention #kernels

LLM sources.twitter 15m ago 2 min read

Responses API WebSockets cut agent loop latency by up to 40%

Why it matters: faster models stop feeling fast if orchestration overhead eats the gain. OpenAI says WebSocket mode made agent workflows up to 40% faster end to end, while lifting effective inference speed from about 65 to nearly 1,000 tokens per second.

#openai #responses-api #websockets

LLM Reddit 4h ago 2 min read

LocalLLaMA Fixates on a Qwen3.6 27B Setup That Pushes 204k Context on Two 16GB GPUs

LocalLLaMA reacted to this post because it brought hard numbers, not vendor marketing: a dual RTX 5060 Ti 16GB setup pushing Qwen3.6 27B to roughly 60 tok/s with a 204k context window.

#qwen #local-llm #vllm

LLM Hacker News 4h ago 2 min read

HN Reads Mistral Medium 3.5 Through a Deployment Lens, Not Just a Benchmark Table

HN treated Mistral Medium 3.5 as more than another model drop, focusing on four-GPU self-hosting, open weights, and remote coding agents rather than headline scores alone.

#mistral #open-weights #coding-agents

LLM Hacker News 4h ago 2 min read

HN Latches Onto OpenAI’s “Goblin” Post for What It Reveals About Reward Tuning

Hacker News liked the joke, but the real draw was OpenAI showing how a playful reward signal inside the Nerdy personality leaked creature metaphors into GPT-5.x behavior.

#openai #gpt-5.5 #model-behavior

LLM 6h ago 2 min read

Anthropic plugs Claude into Adobe and Blender to chase creative workflows

Anthropic is pushing Claude out of the chat box and into the software stack where designers, video editors, and musicians already work. The company says its April 28 release connects Claude to Adobe’s 50+ tool surface, Blender, Autodesk Fusion, SketchUp, Splice, Ableton, and more.

#anthropic #claude #creative-tools

LLM 6h ago 2 min read

NVIDIA pushes open multimodal agents harder with 9x faster Nemotron 3 Nano Omni

NVIDIA is targeting the cost bottleneck in multimodal agents, not just the demo factor. Nemotron 3 Nano Omni claims up to 9x higher throughput, a 256K context window, and six leaderboard wins for document, video, and audio understanding.

#nvidia #multimodal #agents

LLM sources.twitter 10h ago 2 min read

Cursor opens SDK for CI/CD agents, cloud runs, and in-product automation

Cursor is pushing coding agents out of the editor and into infrastructure. Its new SDK exposes the same runtime and harness behind Cursor itself, targeting CI/CD jobs, cloud execution, and embedded agent workflows inside other products.

#cursor #agents #developer-tools

LLM Reddit 12h ago 2 min read

Granite 4.1 landed on LocalLLaMA as an enterprise-first open model play

LocalLLaMA paid attention to Granite 4.1 because IBM went in the opposite direction from giant reasoning hype: a broad release built around dense 3B, 8B, and 30B language models tuned for instruction following and tool calling. Comments welcomed the extra competition, but also pushed back on how strong the benchmarks really are.

#ibm #granite #tool-calling

LLM Reddit 12h ago 2 min read

LocalLLaMA sees a new “Opus at home” contender in MiMo-V2.5-Pro

LocalLLaMA lit up because Xiaomi MiMo dropped an MIT-licensed MoE with 1.02T total parameters, 42B active parameters, and a 1M-token context window. The excitement was real, but so was the hardware reality check: people loved the openness and agentic claims while joking about how many serious GPUs you still need.

#xiaomi #mimo #moe

LLM Hacker News 12h ago 2 min read

HN cared less about the launch copy than the 128B and 256K math behind Mistral Medium 3.5

Hacker News paid attention to Mistral Medium 3.5 because the size-to-capability tradeoff looked real: a 128B dense model with a 256K context window, open weights, and self-hosting claims that do not immediately drift into fantasy. The launch also tied the model to remote coding agents in Vibe and a new Work mode in Le Chat.

#mistral #open-weights #coding-agents

LLM Hacker News 12h ago 2 min read

HN zeroed in on the weirdest Claude Code billing bug yet: HERMES.md in commit history

Hacker News piled onto a Claude Code bug report because the trigger sounded absurd and expensive: having HERMES.md in recent git commit messages could route requests to paid overage instead of the included Max quota. What kept the thread hot was not only the reproduction, but the fight over refunds before Anthropic said affected users would get both refunds and extra credits.

#anthropic #claude-code #billing