Skip to content
LLM Reddit Mar 28, 2026 2 min read

A March 28, 2026 r/LocalLLaMA post turned TurboQuant from a paper topic into an MLX implementation story with custom Metal kernels, code, and an upstream PR. The author reports 4.6x KV cache compression at 0.98x FP16 speed on Qwen2.5-32B, but the repository's 7B README numbers are more conservative, underscoring how model choice and integration details shape the real payoff.

LLM X/Twitter Mar 28, 2026 2 min read

GoogleCloudTech posted a demo on March 27, 2026 showing Gemini CLI using Model Context Protocol (MCP) servers to migrate and deploy a full-stack application. Google's September 11, 2025 Gemini CLI extensions post and December 11, 2025 MCP support announcement show that the demo is built on /deploy for Cloud Run, managed MCP endpoints for Google services, and enterprise controls such as IAM, audit logs, and Model Armor.

LLM X/Twitter Mar 28, 2026 2 min read

Cursor said on March 25, 2026 that cloud agents can now run on customer infrastructure while preserving the same agent harness and workflow experience. Cursor's product post says the generally available setup keeps code, tool execution, and build artifacts inside the customer's network while still giving agents isolated remote environments, multi-model support, and plugin/MCP extensibility.

LLM Hacker News Mar 28, 2026 2 min read

A Hacker News post pushed ATLAS into the spotlight by framing a consumer-GPU coding agent as a serious cost challenger to hosted systems. The headline benchmark is interesting, but the repository itself makes clear that its 74.6% result is not a controlled head-to-head against Claude 4.5 Sonnet because the task counts and evaluation protocols differ.