A Reddit post in r/LocalLLaMA introduces a GGUF release of Qwen3.5-122B-A10B Uncensored (Aggressive) alongside new K_P quants. The author claims 0/465 refusals and zero capability loss, but those results are presented as the author’s own tests rather than independent verification.
LLM
RSS FeedGitHub has launched a public preview that lets teams assign Jira issues directly to the Copilot coding agent and receive AI-generated draft pull requests in GitHub. The company says the integration reduces context switching while preserving existing review and approval controls.
OpenAI Developers announced on March 20, 2026 that verified university students in the United States and Canada can claim $100 in Codex credits. OpenAI’s support page says that equals 2,500 ChatGPT credits, requires student verification through SheerID, and expires 12 months after the grant date.
Google AI Studio promoted Gemini Embedding 2 in a March 12, 2026 X post, and Google’s March 10 blog post says the model maps text, images, video, audio, and documents into a single embedding space. Google says it is in public preview through the Gemini API and Vertex AI and is designed for multimodal retrieval and classification.
Google AI Studio said in a March 19, 2026 post on X that its vibe coding workflow now supports multiplayer collaboration, live data connections, persistent builds, and shadcn, Framer Motion, and npm support. The update pushes AI Studio closer to a browser-based app-building environment instead of a prompt-only prototype tool.
OpenAI on March 11, 2026 detailed how it combines the Responses API with a shell tool and hosted containers to give agents a managed computer environment. The company says the design is meant to make file handling, tool execution, network access, and long-running workflows easier to run in production.
OpenAI introduced GPT-5.4 mini and nano on March 17, 2026 as smaller GPT-5.4 variants for low-latency coding, tool use, and multimodal workflows. The company positioned the models for high-volume API and subagent tasks where speed and cost matter more than maximum capability.
Vercel said on March 19, 2026 that it built Chat SDK to remove the platform-specific plumbing that slowed internal agent rollouts. Vercel’s blog describes an open-source public-beta TypeScript library that lets one bot implementation target Slack, Teams, Google Chat, Discord, Telegram, GitHub, Linear, and now WhatsApp through adapters.
Together AI said on March 19, 2026 that its fine-tuning service now supports tool calling, reasoning, and vision-language model training, with up to 6x higher throughput on MoE architectures. The company says the update also targets very large models, supports datasets up to 100GB, and adds pre-run cost estimates plus live ETAs during training.
Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.
OpenAI Developers said on March 21, 2026 that container startup for skills, hosted shell, and code interpreter was about 10x faster via a new container pool in the Responses API. Updated OpenAI shell docs show hosted shell can create containers automatically, reuse active containers by reference, and keep them alive for 20 minutes of inactivity.
A high-signal r/LocalLLaMA benchmark post said moving Qwen 3.5 27B from mainline llama.cpp to ik_llama.cpp raised prompt evaluation from about 43 tok/sec to 1,122 tok/sec on a Blackwell RTX PRO 4000, with generation climbing from 7.5 tok/sec to 26 tok/sec.