Google AI Studio said in a March 19, 2026 post on X that its vibe coding workflow now supports multiplayer collaboration, live data connections, persistent builds, and shadcn, Framer Motion, and npm support. The update pushes AI Studio closer to a browser-based app-building environment instead of a prompt-only prototype tool.
LLM
RSS FeedFlash-MoE is a C and Metal inference engine that claims to run Qwen3.5-397B-A17B on a 48 GB MacBook Pro. The key idea is to keep a 209 GB MoE model on SSD and stream only the active experts needed for each token.
OpenAI on March 11, 2026 detailed how it combines the Responses API with a shell tool and hosted containers to give agents a managed computer environment. The company says the design is meant to make file handling, tool execution, network access, and long-running workflows easier to run in production.
OpenAI introduced GPT-5.4 mini and nano on March 17, 2026 as smaller GPT-5.4 variants for low-latency coding, tool use, and multimodal workflows. The company positioned the models for high-volume API and subagent tasks where speed and cost matter more than maximum capability.
Vercel said on March 19, 2026 that it built Chat SDK to remove the platform-specific plumbing that slowed internal agent rollouts. Vercel’s blog describes an open-source public-beta TypeScript library that lets one bot implementation target Slack, Teams, Google Chat, Discord, Telegram, GitHub, Linear, and now WhatsApp through adapters.
Together AI said on March 19, 2026 that its fine-tuning service now supports tool calling, reasoning, and vision-language model training, with up to 6x higher throughput on MoE architectures. The company says the update also targets very large models, supports datasets up to 100GB, and adds pre-run cost estimates plus live ETAs during training.
Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.
OpenAI Developers said on March 21, 2026 that container startup for skills, hosted shell, and code interpreter was about 10x faster via a new container pool in the Responses API. Updated OpenAI shell docs show hosted shell can create containers automatically, reuse active containers by reference, and keep them alive for 20 minutes of inactivity.
A high-signal r/LocalLLaMA benchmark post said moving Qwen 3.5 27B from mainline llama.cpp to ik_llama.cpp raised prompt evaluation from about 43 tok/sec to 1,122 tok/sec on a Blackwell RTX PRO 4000, with generation climbing from 7.5 tok/sec to 26 tok/sec.
Together AI and collaborators introduced Mamba-3 as an inference-first state space model. Hacker News traction centered on faster prefill+decode latency, a stronger recurrence design, and open-sourced high-performance kernels.
On February 27, 2026, OpenAI and Amazon announced a multi-year deal covering a Stateful Runtime Environment on Amazon Bedrock, AWS-exclusive third-party distribution for OpenAI Frontier, 2 gigawatts of Trainium capacity, and a $50 billion Amazon investment. The announcement matters because it combines enterprise agent infrastructure, cloud distribution, and custom silicon in one agreement.
A fresh r/LocalLLaMA post argues that the main bottleneck in Graph-RAG multi-hop QA is often reasoning rather than retrieval. The linked paper suggests structured prompting and graph-based context compression can let an open Llama 8B model match or beat a plain 70B baseline at a much lower cost.