LLM

LLM Reddit Mar 30, 2026 2 min read

r/LocalLLaMA Details an Autoresearch Push to 20.34 tok/s for Qwen3.5-397B on M5 Max

A new r/LocalLLaMA benchmark post says an M5 Max system pushed Qwen3.5-397B to 20.34 tok/s through SSD streaming, with I/O parallelism, temporal expert prediction, and Q3-GGUF experts doing most of the work.

#qwen #apple-silicon #inference

LLM Reddit Mar 30, 2026 2 min read

r/MachineLearning Flags LoCoMo Errors and Weak Judge Reliability

Penfield Labs argues that LoCoMo still circulates as a major memory benchmark even though 99 of its 1,540 answer-key entries are score-corrupting and its gpt-4o-mini judge passed 62.81% of intentionally wrong answers in an audit.

#benchmarks #memory-systems #evaluation

LLM Hacker News Mar 30, 2026 2 min read

HN Erupts Over Copilot Injecting Promotional Copy Into a PR

A Hacker News thread turned Zach Manson's Copilot incident into a broader argument about whether coding assistants should be allowed to insert vendor messaging into PR text and other repo metadata.

#copilot #github #developer-tools

LLM Mar 30, 2026 2 min read

NVIDIA puts Dynamo 1.0 into production as an inference OS for AI factories

NVIDIA announced Dynamo 1.0 on March 16, 2026 as a production-grade open-source layer for generative and agentic inference. The release matters because it ties Blackwell performance gains, lower token economics and native integration with major open-source frameworks into one operating model.

#nvidia #dynamo #inference

LLM Reddit Mar 30, 2026 2 min read

r/MachineLearning Pushes a 94-Endpoint LLM Benchmark Into the Spotlight

A March 1 r/MachineLearning post compared 94 LLM endpoints across 25 providers and argued that open models were closing to within a single-digit quality gap of top proprietary systems. The real takeaway is operational: model choice is now about intelligence, price, speed, and deployment freedom at the same time.

#llm-benchmarks #open-source #model-evaluation

LLM Reddit Mar 29, 2026 3 min read

LocalLLaMA Highlights a Community Attempt to Restore Voice Cloning to Mistral’s Voxtral TTS

A March 2026 r/LocalLLaMA post with 123 points and 25 comments spotlighted `voxtral-voice-clone`, a project trying to train the missing codec encoder for Mistral’s Voxtral-4B-TTS-2603. The repo targets zero-shot cloning via `ref_audio`, which the original open-weight release could not support because the encoder weights were not included.

#tts #voice-cloning #mistral

LLM Reddit Mar 29, 2026 3 min read

Reddit Spots TurboQuant as Google Targets 3-Bit KV Cache Compression Without Accuracy Loss

A March 2026 r/singularity post shared Google Research’s TurboQuant work and drew 114 points with 18 comments. Google says the method can shrink KV cache memory by at least 6x on needle tasks, quantize caches to 3 bits without training, and deliver up to 8x attention-logit speedups on H100 GPUs.

#quantization #kv-cache #vector-search

LLM Mar 29, 2026 1 min read

Mistral introduces Mistral Small 4, a unified open-source reasoning and multimodal model

Mistral announced Mistral Small 4 on March 16, 2026 as a single open model that combines reasoning, multimodal input, and agentic coding. Key specs include 119B total parameters, 6B active parameters per token, a 256k context window, Apache 2.0 licensing, and configurable reasoning effort.

#llm #multimodal #reasoning

LLM Mar 29, 2026 1 min read

Mistral launches Leanstral, an open-source code agent for Lean 4

Mistral introduced Leanstral on March 16, 2026 as an open-source code agent built specifically for Lean 4. The release combines 6B active parameters, an Apache 2.0 license, a new FLTEval benchmark, and immediate availability in Mistral Vibe, API form, and downloadable weights.

#llm #code-agents #lean4

LLM X/Twitter Mar 29, 2026 2 min read

OpenAI pushes Codex Security for GitHub vulnerability finding, validation, and remediation

OpenAIDevs pointed developers to Codex Security on March 29, 2026, positioning it as a way to find, validate, and remediate likely vulnerabilities in connected GitHub repositories. OpenAI's docs say the system scans commit by commit, uses repo-specific threat models, validates high-signal findings in an isolated environment, and can move reviewed findings toward GitHub pull requests.

#openai #codex #security

LLM Reddit Mar 29, 2026 2 min read

MachineLearning Highlights TurboQuant for Weights as 4-Bit Quantization Gets Practical

A new r/MachineLearning post pushes TurboQuant beyond KV-cache talk and into weight compression, with a GitHub implementation that targets drop-in low-bit LLM inference.

#quantization #llm #inference

LLM Reddit Mar 29, 2026 2 min read

LocalLLaMA Spots IBM Granite 4.0 3B Vision for Focused Document Extraction

A LocalLLaMA post points to IBM's Granite-4.0-3B-Vision, a compact VLM built for charts, tables, and document key-value extraction rather than generic multimodal chat.

#vlm #document-ai #ibm