LLM

LLM Mar 18, 2026 2 min read

Anthropic measures AI agent autonomy in real deployments

Anthropic has published a study on how much autonomy AI agents are being given in the wild using millions of interactions across Claude Code and its public API. The longest Claude Code turns nearly doubled from under 25 minutes to over 45 minutes in three months, while experienced users became more likely to auto-approve and more likely to interrupt when needed.

#anthropic #agents #claude-code

LLM Mar 18, 2026 2 min read

OpenAI explains why Codex Security does not start from a SAST report

OpenAI says Codex Security is built to reason from repository behavior, not to triage a precomputed SAST report. The company argues that many important bugs come from failed invariants and transformation chains, so the agent should validate hypotheses in context before escalating them.

#openai #codex-security #appsec

LLM Reddit Mar 18, 2026 2 min read

r/LocalLLaMA maps a transformer “danger zone” where duplicating layers starts breaking models

A detailed r/LocalLLaMA experiment claims that copying layer blocks around 50-56% depth consistently hurts or collapses model quality across multiple architectures. The post stands out because it compares dense, hybrid, MoE, and transplant setups from a fully local MLX workflow.

#transformers #model-surgery #localllama

LLM Reddit Mar 18, 2026 2 min read

r/MachineLearning highlights Attention Residuals as Kimi targets fixed-sum PreNorm bottlenecks

A Reddit thread surfaced Kimi's AttnRes paper, which argues that fixed residual accumulation in PreNorm LLMs dilutes deeper layers. The proposed attention-based residual path and its block variant aim to keep the gains without exploding memory cost.

#kimi #llm-architecture #attention

LLM Hacker News Mar 18, 2026 2 min read

Hacker News spots Unsloth Studio as local LLM workflows converge on chat, tuning, and export

Unsloth Studio reached the Hacker News front page as a local-first AI workspace that groups chat, installation, data recipes, and model export in one flow. The reaction suggests strong demand for tooling that sits between raw ML stacks and consumer desktop apps.

#unsloth #local-llm #model-training

LLM X/Twitter Mar 17, 2026 2 min read

Google DeepMind brings Gemini Embedding 2 to preview for multimodal retrieval

Google DeepMind said on X that Gemini Embedding 2 is now in preview through the Gemini API and Vertex AI. The model is positioned as the first fully multimodal embedding model built on the Gemini architecture, aiming to unify retrieval across text, images, video, audio, and documents.

#google-deepmind #gemini #embeddings

LLM X/Twitter Mar 17, 2026 2 min read

OpenAI expands its small-model stack with GPT-5.4 mini and nano

OpenAI said on X that GPT-5.4 mini is rolling out in ChatGPT, Codex, and the API, while GPT-5.4 nano is aimed at lower-cost API workloads. The company is positioning the pair as faster small models for coding, multimodal tasks, and agent sub-workflows.

#openai #gpt-5.4 #codex

LLM Reddit Mar 17, 2026 3 min read

Covenant-72B puts permissionless distributed GPU training ahead of raw hype

A r/LocalLLaMA post that reached 92 points and 25 comments spotlighted Covenant-72B as a 72B-parameter model trained from scratch by 20+ participants through decentralized infrastructure on the Bittensor blockchain. The most credible story here is not an unsupported performance victory, but a concrete demonstration of permissionless collaborative pre-training, SparseLoCo-based communication reduction, Apache 2.0 licensing, and a separate chat-tuned variant.

#llm #decentralized-training #bittensor

LLM Reddit Mar 17, 2026 2 min read

Unsloth Studio beta goes after the local model workflow in one interface

A high-engagement r/LocalLLaMA post highlighted Unsloth Studio, a beta open-source web UI that aims to train, run, and export open models from one local interface. The discussion framed it as a possible LM Studio challenger in the GGUF ecosystem, while top commenters noted that many advanced users still lean on vLLM or direct llama.cpp workflows.

#llm #unsloth #gguf

LLM Mar 17, 2026 2 min read

Google adds project spend caps and faster tier upgrades for the Gemini API

Google introduced Project Spend Caps, revamped Usage Tiers, and new billing dashboards for Gemini API developers in AI Studio. The update is aimed at making cost control and scaling behavior more predictable for teams moving into paid usage.

#google #gemini #api

LLM X/Twitter Mar 17, 2026 2 min read

Mistral AI partners with NVIDIA on open frontier models and joins Nemotron Coalition

Mistral AI said on March 16, 2026 that it is entering a strategic partnership with NVIDIA to co-develop frontier open-source AI models. A linked Mistral post says the effort begins with Mistral joining the NVIDIA Nemotron Coalition as a founding member and contributing large-scale model development plus multimodal capabilities.

#mistral #nvidia #open-models

LLM Reddit Mar 17, 2026 2 min read

r/LocalLLaMA Pushes Mistral Small 4, a 119B MoE With 256k Context and Switchable Reasoning

On March 16, 2026, a r/LocalLLaMA link to Mistral Small 4 reached 504 points and 196 comments. The Hugging Face model card describes a 119B MoE with 4 active experts, 256k context, multimodal input, and per-request reasoning control.

#mistral #open-models #multimodal