LLM

LLM Mar 29, 2026 1 min read

Mistral introduces Mistral Small 4, a unified open-source reasoning and multimodal model

Mistral announced Mistral Small 4 on March 16, 2026 as a single open model that combines reasoning, multimodal input, and agentic coding. Key specs include 119B total parameters, 6B active parameters per token, a 256k context window, Apache 2.0 licensing, and configurable reasoning effort.

#llm #multimodal #reasoning

LLM Mar 29, 2026 1 min read

Mistral launches Leanstral, an open-source code agent for Lean 4

Mistral introduced Leanstral on March 16, 2026 as an open-source code agent built specifically for Lean 4. The release combines 6B active parameters, an Apache 2.0 license, a new FLTEval benchmark, and immediate availability in Mistral Vibe, API form, and downloadable weights.

#llm #code-agents #lean4

LLM X/Twitter Mar 29, 2026 2 min read

Cursor says real-time RL lets Composer ship better checkpoints every five hours

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer checkpoints as often as every five hours. Cursor's research post says the loop trains on billions of production tokens from real user interactions, runs evals including CursorBench before deployment, and has already shown gains in edit persistence, dissatisfied follow-ups, and latency.

#cursor #composer #reinforcement-learning

LLM X/Twitter Mar 29, 2026 2 min read

OpenAI pushes Codex Security for GitHub vulnerability finding, validation, and remediation

OpenAIDevs pointed developers to Codex Security on March 29, 2026, positioning it as a way to find, validate, and remediate likely vulnerabilities in connected GitHub repositories. OpenAI's docs say the system scans commit by commit, uses repo-specific threat models, validates high-signal findings in an isolated environment, and can move reviewed findings toward GitHub pull requests.

#openai #codex #security

LLM Reddit Mar 29, 2026 2 min read

MachineLearning Highlights TurboQuant for Weights as 4-Bit Quantization Gets Practical

A new r/MachineLearning post pushes TurboQuant beyond KV-cache talk and into weight compression, with a GitHub implementation that targets drop-in low-bit LLM inference.

#quantization #llm #inference

LLM Reddit Mar 29, 2026 2 min read

LocalLLaMA Spots IBM Granite 4.0 3B Vision for Focused Document Extraction

A LocalLLaMA post points to IBM's Granite-4.0-3B-Vision, a compact VLM built for charts, tables, and document key-value extraction rather than generic multimodal chat.

#vlm #document-ai #ibm

LLM Reddit Mar 29, 2026 2 min read

r/LocalLLaMA compresses TurboQuant into one idea: rotate first, quantize second

A high-scoring r/LocalLLaMA post explains TurboQuant not as a polar-coordinates trick but as random rotation before quantization. The linked arXiv paper claims near-optimal distortion rates, a residual QJL stage for inner products, and quality-neutral KV cache quantization at 3.5 bits per channel.

#turboquant #quantization #kv-cache

LLM X/Twitter Mar 28, 2026 2 min read

GitHub shows Copilot CLI generating unit tests with plan mode, /fleet, and autopilot

GitHub said on March 28, 2026 that Copilot CLI can create a robust test suite from the terminal by combining plan mode, /fleet, and autopilot. The linked GitHub docs describe /fleet as parallel subagent execution and autopilot as autonomous multi-step completion, making the post a concrete example of multi-agent testing workflows in the CLI.

#github #copilot-cli #testing

LLM Reddit Mar 28, 2026 2 min read

r/LocalLLaMA tracks TurboQuant on MLX as KV cache compression nears FP16 speed

A March 28, 2026 r/LocalLLaMA post turned TurboQuant from a paper topic into an MLX implementation story with custom Metal kernels, code, and an upstream PR. The author reports 4.6x KV cache compression at 0.98x FP16 speed on Qwen2.5-32B, but the repository's 7B README numbers are more conservative, underscoring how model choice and integration details shape the real payoff.

#mlx #kv-cache #metal

LLM Mar 28, 2026 2 min read

OpenAI moves to acquire Promptfoo to bring agent security testing into Frontier

OpenAI announced plans to acquire Promptfoo on March 9, 2026. The company says Promptfoo’s security testing and evaluation technology will be integrated into OpenAI Frontier so enterprises can test and document risks such as prompt injection, jailbreaks, data leaks, and tool misuse earlier in the development cycle.

#openai #promptfoo #ai-security

LLM Mar 28, 2026 2 min read

OpenAI launches GPT-5.4 mini and nano for faster coding and subagent workloads

OpenAI announced GPT-5.4 mini and nano on March 17, 2026. The company says mini is more than 2x faster than GPT-5 mini while improving coding, reasoning, multimodal understanding, and tool use, while nano targets low-cost classification, extraction, ranking, and simpler coding subagents.

#openai #gpt-5.4 #coding

LLM X/Twitter Mar 28, 2026 2 min read

Google Cloud shows Gemini CLI using MCP servers for agentic app migration and deployment

GoogleCloudTech posted a demo on March 27, 2026 showing Gemini CLI using Model Context Protocol (MCP) servers to migrate and deploy a full-stack application. Google's September 11, 2025 Gemini CLI extensions post and December 11, 2025 MCP support announcement show that the demo is built on /deploy for Cloud Run, managed MCP endpoints for Google services, and enterprise controls such as IAM, audit logs, and Model Armor.

#google-cloud #gemini-cli #mcp