Anthropic has published a study on how much autonomy AI agents are being given in the wild using millions of interactions across Claude Code and its public API. The longest Claude Code turns nearly doubled from under 25 minutes to over 45 minutes in three months, while experienced users became more likely to auto-approve and more likely to interrupt when needed.
LLM
RSS FeedOpenAI says Codex Security is built to reason from repository behavior, not to triage a precomputed SAST report. The company argues that many important bugs come from failed invariants and transformation chains, so the agent should validate hypotheses in context before escalating them.
A detailed r/LocalLLaMA experiment claims that copying layer blocks around 50-56% depth consistently hurts or collapses model quality across multiple architectures. The post stands out because it compares dense, hybrid, MoE, and transplant setups from a fully local MLX workflow.
A Reddit thread surfaced Kimi's AttnRes paper, which argues that fixed residual accumulation in PreNorm LLMs dilutes deeper layers. The proposed attention-based residual path and its block variant aim to keep the gains without exploding memory cost.
Unsloth Studio reached the Hacker News front page as a local-first AI workspace that groups chat, installation, data recipes, and model export in one flow. The reaction suggests strong demand for tooling that sits between raw ML stacks and consumer desktop apps.
Google DeepMind said on X that Gemini Embedding 2 is now in preview through the Gemini API and Vertex AI. The model is positioned as the first fully multimodal embedding model built on the Gemini architecture, aiming to unify retrieval across text, images, video, audio, and documents.
OpenAI said on X that GPT-5.4 mini is rolling out in ChatGPT, Codex, and the API, while GPT-5.4 nano is aimed at lower-cost API workloads. The company is positioning the pair as faster small models for coding, multimodal tasks, and agent sub-workflows.
A r/LocalLLaMA post that reached 92 points and 25 comments spotlighted Covenant-72B as a 72B-parameter model trained from scratch by 20+ participants through decentralized infrastructure on the Bittensor blockchain. The most credible story here is not an unsupported performance victory, but a concrete demonstration of permissionless collaborative pre-training, SparseLoCo-based communication reduction, Apache 2.0 licensing, and a separate chat-tuned variant.
A high-engagement r/LocalLLaMA post highlighted Unsloth Studio, a beta open-source web UI that aims to train, run, and export open models from one local interface. The discussion framed it as a possible LM Studio challenger in the GGUF ecosystem, while top commenters noted that many advanced users still lean on vLLM or direct llama.cpp workflows.
Google introduced Project Spend Caps, revamped Usage Tiers, and new billing dashboards for Gemini API developers in AI Studio. The update is aimed at making cost control and scaling behavior more predictable for teams moving into paid usage.
Mistral AI said on March 16, 2026 that it is entering a strategic partnership with NVIDIA to co-develop frontier open-source AI models. A linked Mistral post says the effort begins with Mistral joining the NVIDIA Nemotron Coalition as a founding member and contributing large-scale model development plus multimodal capabilities.
On March 16, 2026, a r/LocalLLaMA link to Mistral Small 4 reached 504 points and 196 comments. The Hugging Face model card describes a 119B MoE with 4 active experts, 256k context, multimodal input, and per-request reasoning control.