LLM

LLM X/Twitter Mar 30, 2026 2 min read

Google rolls out Gemini 3.1 Flash Live across Gemini Live, Search Live, and AI Studio

Google DeepMind said on March 26, 2026 that Gemini 3.1 Flash Live is rolling out in Gemini Live and Google Search Live, while developers can access it through Google AI Studio. Google’s announcement positions 3.1 Flash Live as its highest-quality audio model, with lower latency, improved tonal understanding, and benchmark gains including 90.8% on ComplexFuncBench Audio.

#google #gemini #voice-ai

LLM Reddit Mar 30, 2026 2 min read

r/LocalLLaMA Focuses on a Qwen3.5-27B + llama.cpp + OpenCode Stack That Actually Works

A March 2026 r/LocalLLaMA post with 126 points and 45 comments highlighted a practical guide for running Qwen3.5-27B through llama.cpp and wiring it into OpenCode. The post stands out because it covers the operational details that usually break local coding setups: quant choice, chat-template fixes, VRAM budgeting, Tailscale networking, and tool-calling behavior.

#qwen #llama-cpp #opencode

LLM X/Twitter Mar 30, 2026 2 min read

Cursor publishes the Composer 2 technical report detailing continued pretraining and large-scale RL for coding agents

Cursor has published the Composer 2 technical report, outlining its code-focused continued pretraining, large-scale reinforcement learning pipeline, and CursorBench-led evaluation strategy. The report offers an unusually detailed first-party look at how a production coding agent is trained and measured.

#cursor #composer-2 #coding-agents

LLM Reddit Mar 30, 2026 2 min read

r/LocalLLaMA Details an Autoresearch Push to 20.34 tok/s for Qwen3.5-397B on M5 Max

A new r/LocalLLaMA benchmark post says an M5 Max system pushed Qwen3.5-397B to 20.34 tok/s through SSD streaming, with I/O parallelism, temporal expert prediction, and Q3-GGUF experts doing most of the work.

#qwen #apple-silicon #inference

LLM Reddit Mar 30, 2026 2 min read

r/MachineLearning Flags LoCoMo Errors and Weak Judge Reliability

Penfield Labs argues that LoCoMo still circulates as a major memory benchmark even though 99 of its 1,540 answer-key entries are score-corrupting and its gpt-4o-mini judge passed 62.81% of intentionally wrong answers in an audit.

#benchmarks #memory-systems #evaluation

LLM Hacker News Mar 30, 2026 2 min read

HN Erupts Over Copilot Injecting Promotional Copy Into a PR

A Hacker News thread turned Zach Manson's Copilot incident into a broader argument about whether coding assistants should be allowed to insert vendor messaging into PR text and other repo metadata.

#copilot #github #developer-tools

LLM Mar 30, 2026 2 min read

NVIDIA puts Dynamo 1.0 into production as an inference OS for AI factories

NVIDIA announced Dynamo 1.0 on March 16, 2026 as a production-grade open-source layer for generative and agentic inference. The release matters because it ties Blackwell performance gains, lower token economics and native integration with major open-source frameworks into one operating model.

#nvidia #dynamo #inference

LLM Reddit Mar 30, 2026 2 min read

r/MachineLearning Pushes a 94-Endpoint LLM Benchmark Into the Spotlight

A March 1 r/MachineLearning post compared 94 LLM endpoints across 25 providers and argued that open models were closing to within a single-digit quality gap of top proprietary systems. The real takeaway is operational: model choice is now about intelligence, price, speed, and deployment freedom at the same time.

#llm-benchmarks #open-source #model-evaluation

LLM X/Twitter Mar 29, 2026 2 min read

Anthropic says experienced Claude users iterate more carefully and delegate less

Anthropic said on March 24, 2026 that a new Anthropic Economic Index update shows longer-term Claude users iterating more carefully, giving the model less full autonomy, attempting higher-value tasks, and receiving more successful responses. In related Economic Index posts on its X timeline, Anthropic also said the top 10 tasks now account for 19% of consumer conversations, down from 24%, while personal queries rise and U.S. adoption rates continue to converge.

#anthropic #claude #economic-index

LLM X/Twitter Mar 29, 2026 2 min read

OpenAI adds a File Library so ChatGPT users can reuse uploaded and created files

OpenAI said on March 23, 2026 that ChatGPT now stores uploaded and created files in a persistent File Library for reuse across conversations. OpenAI's official release notes say the Library is web-only for now, while recent files and file search are also available on iOS and Android, with rollout underway for Plus, Pro, and Business users globally and the EEA, Switzerland, and the UK coming soon.

#openai #chatgpt #files

LLM Reddit Mar 29, 2026 3 min read

LocalLLaMA Highlights a Community Attempt to Restore Voice Cloning to Mistral’s Voxtral TTS

A March 2026 r/LocalLLaMA post with 123 points and 25 comments spotlighted `voxtral-voice-clone`, a project trying to train the missing codec encoder for Mistral’s Voxtral-4B-TTS-2603. The repo targets zero-shot cloning via `ref_audio`, which the original open-weight release could not support because the encoder weights were not included.

#tts #voice-cloning #mistral

LLM Reddit Mar 29, 2026 3 min read

Reddit Spots TurboQuant as Google Targets 3-Bit KV Cache Compression Without Accuracy Loss

A March 2026 r/singularity post shared Google Research’s TurboQuant work and drew 114 points with 18 comments. Google says the method can shrink KV cache memory by at least 6x on needle tasks, quantize caches to 3 bits without training, and deliver up to 8x attention-logit speedups on H100 GPUs.

#quantization #kv-cache #vector-search