OpenAI said on March 6, 2026 that Codex Security is entering research preview for ChatGPT Pro, Enterprise, Business, and Edu users in Codex web. The company says the application-security agent uses project-specific threat models, contextual validation, and patch proposals, and in beta scanned more than 1.2 million commits.
LLM
RSS FeedGitHub said in a March 17, 2026 X thread that Copilot coding agent now adds model selection, self-review before PRs, built-in code/secret/dependency scanning, custom agents, and cloud-to-CLI handoff. GitHub’s blog frames the upgrade as a smoother delegation workflow for background coding tasks.
A LocalLLaMA thread on March 18, 2026 pushed fresh attention toward Mamba-3, a new state space model release from researchers at Carnegie Mellon University, Princeton, Cartesia AI, and Together AI. The project shifts its design goal from training speed to inference efficiency and claims prefill+decode latency wins over Mamba-2, Gated DeltaNet, and Llama-3.2-1B at the 1.5B scale.
At GTC on March 16, 2026, NVIDIA announced Dynamo 1.0 as a production-grade open source inference stack for generative and agentic AI. NVIDIA says Dynamo can boost Blackwell inference performance by up to 7x while integrating with major frameworks and cloud providers.
Google on Mar 17, 2026 introduced new Gemini API features for agentic workflows, including combined built-in and custom tools, context circulation across tool calls, and Maps grounding for Gemini 3. The update is designed to reduce orchestration work for complex multi-step applications.
A March 15, 2026 Hacker News post about GreenBoost reached 124 points and 25 comments. The open-source Linux project combines a kernel module and CUDA shim to tier model memory across VRAM, DDR4, and NVMe so larger local LLMs can run without changing inference apps.
Google introduced Gemini 3.1 Flash-Lite on March 3, 2026 as its fastest and most cost-efficient Gemini 3 series model. The model is rolling out in preview through the Gemini API in Google AI Studio and Vertex AI, with pricing of $0.25/1M input tokens and $1.50/1M output tokens, plus claims of a 2.5x faster Time to First Answer Token and 45% higher output speed than 2.5 Flash.
OpenAI said on March 17, 2026 that GPT-5.4 mini is now available in ChatGPT, Codex, and the API, with a follow-up post confirming GPT-5.4 nano in the API. OpenAI's developer docs position mini as its strongest mini model yet for coding, computer use, and subagents, while nano is framed as the cheapest GPT-5.4-class model for high-volume tasks like ranking, extraction, and sub-agent work.
A March 17, 2026 r/LocalLLaMA post with 534 points and 69 comments highlighted Hugging Face’s new hf-agents CLI extension. The tool chains llmfit, llama.cpp, and Pi so users can move from hardware detection to a running local coding agent in one command.
A March 17, 2026 Hacker News post about GPT-5.4 mini and nano reached 236 points and 143 comments. OpenAI is positioning mini as a fast coding and tool-use model for Codex, the API, and ChatGPT, while nano targets cheaper classification, extraction, and subagent workloads.
OpenAI Developers said on X that GPT-5.4 mini and nano are now part of the GPT-5.4 family for developer workflows. OpenAI positions mini as a faster coding and tool-use model for API, Codex, and ChatGPT, while nano is the lowest-cost option for lighter API workloads.
A project post in r/MachineLearning points to mlx-tune, a library that wraps Apple’s MLX stack in an Unsloth-compatible training API for SFT, DPO, GRPO, LoRA, and vision-language fine-tuning on Apple Silicon Macs.