LLM

LLM X/Twitter Mar 22, 2026 2 min read

Google AI Studio expands vibe coding with multiplayer and persistent builds

Google AI Studio said in a March 19, 2026 post on X that its vibe coding workflow now supports multiplayer collaboration, live data connections, persistent builds, and shadcn, Framer Motion, and npm support. The update pushes AI Studio closer to a browser-based app-building environment instead of a prompt-only prototype tool.

#google #gemini #ai-studio

LLM Hacker News Mar 22, 2026 2 min read

Flash-MoE: Running a 397B Parameter Model on a Laptop

Flash-MoE is a C and Metal inference engine that claims to run Qwen3.5-397B-A17B on a 48 GB MacBook Pro. The key idea is to keep a 209 GB MoE model on SSD and stream only the active experts needed for each token.

#llm #moe #metal

LLM Mar 22, 2026 2 min read

OpenAI equips the Responses API with shell, containers, and compaction for production agents

OpenAI on March 11, 2026 detailed how it combines the Responses API with a shell tool and hosted containers to give agents a managed computer environment. The company says the design is meant to make file handling, tool execution, network access, and long-running workflows easier to run in production.

#openai #responses-api #agents

LLM Mar 22, 2026 2 min read

OpenAI launches GPT-5.4 mini and nano for coding, tool use, and subagent workloads

OpenAI introduced GPT-5.4 mini and nano on March 17, 2026 as smaller GPT-5.4 variants for low-latency coding, tool use, and multimodal workflows. The company positioned the models for high-volume API and subagent tasks where speed and cost matter more than maximum capability.

#openai #gpt-5.4 #coding

LLM X/Twitter Mar 22, 2026 2 min read

Vercel launches Chat SDK to ship agents across Slack, Discord, GitHub, and more from one codebase

Vercel said on March 19, 2026 that it built Chat SDK to remove the platform-specific plumbing that slowed internal agent rollouts. Vercel’s blog describes an open-source public-beta TypeScript library that lets one bot implementation target Slack, Teams, Google Chat, Discord, Telegram, GitHub, Linear, and now WhatsApp through adapters.

#vercel #chat-sdk #agents

LLM X/Twitter Mar 22, 2026 2 min read

Together AI expands fine-tuning with tool calling, reasoning, and VLM support plus faster MoE training

Together AI said on March 19, 2026 that its fine-tuning service now supports tool calling, reasoning, and vision-language model training, with up to 6x higher throughput on MoE architectures. The company says the update also targets very large models, supports datasets up to 100GB, and adds pre-run cost estimates plus live ETAs during training.

#together-ai #fine-tuning #tool-calling

LLM X/Twitter Mar 22, 2026 2 min read

Cloudflare brings Kimi K2.5 to Workers AI and says agent coding reviews cut costs by 77%

Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.

#cloudflare #workers-ai #kimi-k2-5

LLM X/Twitter Mar 22, 2026 2 min read

OpenAI adds container pools to the Responses API for faster hosted shell and code interpreter

OpenAI Developers said on March 21, 2026 that container startup for skills, hosted shell, and code interpreter was about 10x faster via a new container pool in the Responses API. Updated OpenAI shell docs show hosted shell can create containers automatically, reuse active containers by reference, and keep them alive for 20 minutes of inactivity.

#openai #responses-api #developer-tools

LLM Reddit Mar 22, 2026 2 min read

r/LocalLLaMA Benchmarks ik_llama.cpp at 26x Faster Qwen 3.5 Prompt Ingestion

A high-signal r/LocalLLaMA benchmark post said moving Qwen 3.5 27B from mainline llama.cpp to ik_llama.cpp raised prompt evaluation from about 43 tok/sec to 1,122 tok/sec on a Blackwell RTX PRO 4000, with generation climbing from 7.5 tok/sec to 26 tok/sec.

#llama.cpp #qwen #local-llm

LLM Hacker News Mar 22, 2026 2 min read

Hacker News Flags Mamba-3 as an Inference-First State Space Model Push

Together AI and collaborators introduced Mamba-3 as an inference-first state space model. Hacker News traction centered on faster prefill+decode latency, a stronger recurrence design, and open-sourced high-performance kernels.

#mamba #ssm #inference

LLM Mar 22, 2026 3 min read

OpenAI and Amazon Unveil $50 Billion Partnership for Frontier, Bedrock, and 2 GW of Trainium

On February 27, 2026, OpenAI and Amazon announced a multi-year deal covering a Stateful Runtime Environment on Amazon Bedrock, AWS-exclusive third-party distribution for OpenAI Frontier, 2 gigawatts of Trainium capacity, and a $50 billion Amazon investment. The announcement matters because it combines enterprise agent infrastructure, cloud distribution, and custom silicon in one agreement.

#openai #amazon #bedrock

LLM Reddit Mar 22, 2026 2 min read

r/LocalLLaMA Highlights Graph-RAG Work That Lets Llama 8B Challenge 70B Multi-Hop QA

A fresh r/LocalLLaMA post argues that the main bottleneck in Graph-RAG multi-hop QA is often reasoning rather than retrieval. The linked paper suggests structured prompting and graph-based context compression can let an open Llama 8B model match or beat a plain 70B baseline at a much lower cost.

#graph-rag #llama #reasoning