OpenAI said on March 6, 2026 that Codex Security is entering research preview for ChatGPT Pro, Enterprise, Business, and Edu users in Codex web. The company says the application-security agent uses project-specific threat models, contextual validation, and patch proposals, and in beta scanned more than 1.2 million commits.
LLM
RSS FeedGitHub said in a March 17, 2026 X thread that Copilot coding agent now adds model selection, self-review before PRs, built-in code/secret/dependency scanning, custom agents, and cloud-to-CLI handoff. GitHub’s blog frames the upgrade as a smoother delegation workflow for background coding tasks.
A LocalLLaMA thread on March 18, 2026 pushed fresh attention toward Mamba-3, a new state space model release from researchers at Carnegie Mellon University, Princeton, Cartesia AI, and Together AI. The project shifts its design goal from training speed to inference efficiency and claims prefill+decode latency wins over Mamba-2, Gated DeltaNet, and Llama-3.2-1B at the 1.5B scale.
Google on Mar 17, 2026 introduced new Gemini API features for agentic workflows, including combined built-in and custom tools, context circulation across tool calls, and Maps grounding for Gemini 3. The update is designed to reduce orchestration work for complex multi-step applications.
A March 15, 2026 Hacker News post about GreenBoost reached 124 points and 25 comments. The open-source Linux project combines a kernel module and CUDA shim to tier model memory across VRAM, DDR4, and NVMe so larger local LLMs can run without changing inference apps.
A March 17, 2026 Hacker News post about GPT-5.4 mini and nano reached 236 points and 143 comments. OpenAI is positioning mini as a fast coding and tool-use model for Codex, the API, and ChatGPT, while nano targets cheaper classification, extraction, and subagent workloads.
OpenAI Developers said on X that GPT-5.4 mini and nano are now part of the GPT-5.4 family for developer workflows. OpenAI positions mini as a faster coding and tool-use model for API, Codex, and ChatGPT, while nano is the lowest-cost option for lighter API workloads.
A project post in r/MachineLearning points to mlx-tune, a library that wraps Apple’s MLX stack in an Unsloth-compatible training API for SFT, DPO, GRPO, LoRA, and vision-language fine-tuning on Apple Silicon Macs.
OpenAI says Codex Security is built to reason from repository behavior, not to triage a precomputed SAST report. The company argues that many important bugs come from failed invariants and transformation chains, so the agent should validate hypotheses in context before escalating them.
A detailed r/LocalLLaMA experiment claims that copying layer blocks around 50-56% depth consistently hurts or collapses model quality across multiple architectures. The post stands out because it compares dense, hybrid, MoE, and transplant setups from a fully local MLX workflow.
A Reddit thread surfaced Kimi's AttnRes paper, which argues that fixed residual accumulation in PreNorm LLMs dilutes deeper layers. The proposed attention-based residual path and its block variant aim to keep the gains without exploding memory cost.
Google DeepMind said on X that Gemini Embedding 2 is now in preview through the Gemini API and Vertex AI. The model is positioned as the first fully multimodal embedding model built on the Gemini architecture, aiming to unify retrieval across text, images, video, audio, and documents.