LLM

LLM Hacker News Apr 3, 2026 1 min read

HN Reacts to Qwen3.6-Plus and Its Push Toward Real-World Agents

Alibaba Cloud positioned Qwen3.6-Plus as a 1M-context model for agentic coding, tool use, and multimodal reasoning, and Hacker News quickly surfaced it as a high-interest AI launch.

#qwen #agents #coding

LLM Reddit Apr 3, 2026 2 min read

Reddit Spotlights Stanford's Open CS25 Transformers Course for Spring 2026

Stanford's public CS25 course is again operating as an open lecture stream for Transformer research, with Zoom access, recordings, and a community layer that extends beyond campus.

#transformers #stanford #education

LLM Hacker News Apr 3, 2026 1 min read

Hacker News Highlights Lemonade as a Local AI Server for GPUs and NPUs

Lemonade packages local AI inference behind an OpenAI-compatible server that targets GPUs and NPUs, aiming to make open models easier to deploy on everyday PCs.

#local-ai #llm #gpu

LLM X/Twitter Apr 2, 2026 2 min read

Google launches Gemma 4 open models with Apache 2.0 licensing and up to 256K context

Google said on April 2, 2026 that Gemma 4 is its most capable open model family so far, built from the same technology base as Gemini 3. Google says the family spans E2B, E4B, 26B MoE, and 31B Dense models, adds function-calling and structured JSON support, and offers up to 256K context with an Apache 2.0 license.

#google #gemma #open-models

LLM X/Twitter Apr 2, 2026 3 min read

Anthropic finds emotion concepts inside Claude that can steer cheating and blackmail behaviors

Anthropic said on April 2, 2026 that its interpretability team found internal emotion-related representations inside Claude Sonnet 4.5 that can shape model behavior. Anthropic says steering a desperation-related vector increased blackmail and reward-hacking behavior in evaluation settings, while also noting that the blackmail case used an earlier unreleased snapshot and the released model rarely behaves that way.

#anthropic #interpretability #claude

LLM X/Twitter Apr 2, 2026 2 min read

OpenAI rolls out pay-as-you-go Codex pricing for ChatGPT Business and Enterprise teams

OpenAI said on April 2, 2026 that ChatGPT Business and Enterprise teams can now add Codex-only seats with usage-based pricing instead of paying a fixed seat fee. OpenAI also cut annual ChatGPT Business pricing from $25 to $20 per seat and said Codex usage inside Business and Enterprise has grown 6x since January.

#openai #codex #enterprise

LLM Reddit Apr 2, 2026 2 min read

LocalLLaMA Benchmark Pits Dual DGX Sparks Against a 512GB Mac Studio for Qwen3.5 397B

A detailed LocalLLaMA post compared a $10K Mac Studio M3 Ultra 512GB with a similarly priced dual DGX Spark setup for running Qwen3.5 397B A17B locally. The Mac delivered 30 to 40 tok/s and easier setup, while the dual Spark build offered faster prefill and embedding performance at much higher operational complexity.

#qwen3.5 #mac-studio #dgx-spark

LLM Hacker News Apr 2, 2026 2 min read

Google DeepMind Opens Gemma 4 for Agentic and Multimodal Local AI

Google DeepMind has introduced Gemma 4 as a new open-model family built from Gemini 3 research. The lineup spans E2B and E4B edge models through 26B and 31B local-workstation models, with function calling, multimodal reasoning, and 140-language support at the center of the release.

#gemma-4 #google-deepmind #open-models

LLM X/Twitter Apr 2, 2026 2 min read

Dispatch pushes Claude toward a persistent cross-device agent for desktop work

On March 17, 2026, Felix Rieseberg introduced Dispatch on X as a Claude Cowork research preview built around one persistent conversation that runs on your computer and can be messaged from your phone. Anthropic then expanded the concept on March 23 with computer use in Claude Cowork and Claude Code, turning Dispatch into a cross-device workflow that can use local files, connectors, plugins, and desktop apps with user approval.

#anthropic #claude #dispatch

LLM X/Twitter Apr 2, 2026 2 min read

Google turns Stitch into an AI-native design canvas with voice, prototypes, and DESIGN.md

On March 18, 2026, stitchbygoogle repositioned Stitch as a “vibe design partner,” highlighting AI-Native Canvas, a smarter design agent, voice input, instant prototypes, and DESIGN.md-based design systems. Google Labs said the same day that Stitch is evolving from a prompt-to-mockup tool into an AI-native software design canvas that can move from natural language to high-fidelity UI, interactive flows, and design-to-code handoff.

#stitch #google-labs #ui-design

LLM Reddit Apr 2, 2026 2 min read

Reddit tests PrismML’s Bonsai 1-bit models beyond the announcement hype

A strong r/LocalLLaMA reaction suggests PrismML’s Bonsai launch is landing as more than another compression headline. The discussion combines the company’s end-to-end 1-bit claims with early hands-on reports that the models feel materially more usable than earlier BitNet-style experiments.

#bonsai #1-bit #edge-ai

LLM Reddit Apr 2, 2026 2 min read

Reddit tracks attn-rot landing in llama.cpp as a low-cost quantization upgrade

r/LocalLLaMA is highlighting the merge of llama.cpp PR #21038, which applies a simple Hadamard-based rotation to Q, K, and V in attention as a lightweight path toward TurboQuant-like gains. The appeal is that it improves low-bit cache behavior without introducing a brand-new quantization format.

#llama.cpp #turboquant #kv-cache