GitHub said on March 5, 2026 that GPT-5.4 is now generally available and rolling out in GitHub Copilot. The company claims early testing showed higher success rates plus stronger logical reasoning and task execution on complex, tool-dependent developer workflows.
LLM
Anthropic said on March 6, 2026 that Claude Opus 4.6 uncovered 22 Firefox vulnerabilities in two weeks, including 14 high-severity issues, during a collaboration with Mozilla. The accompanying write-up argues that frontier models are becoming materially useful for real vulnerability discovery, not just benchmark performance.
OpenAI announced Codex for Open Source on March 6, 2026, pitching the program as practical support for maintainers who review code, manage large repositories, and handle security work. The program combines API credits, six months of ChatGPT Pro with Codex, and conditional Codex Security access for eligible projects.
A well-received PSA on r/LocalLLaMA argues that convenience layers such as Ollama and LM Studio can change model behavior enough to distort evaluation. The more durable lesson from the thread is reproducibility: hold templates, stop tokens, sampling, runtime versions, and quantization constant before judging a model.
Katana Quant's post, which gained traction on Hacker News, turns a familiar complaint about AI code into a measurable engineering failure. The practical message is straightforward: define acceptance criteria before code generation, not after.
A high-scoring r/LocalLLaMA post details a practical move from Ollama/LM Studio-centric flows to llama-swap for multi-model operations. The key value discussed is operational control: backend flexibility, policy filters, and low-friction service management.
A high-traction Hacker News thread highlighted Simon Willison’s "Agentic Engineering Patterns" guide, which organizes practical workflows for coding agents. The focus is operational discipline: testing-first loops, readable change flow, and reusable prompts.
Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.
OpenAI introduced ChatGPT for Excel on March 5, 2026. The feature targets paid ChatGPT users and adds spreadsheet-native analysis and formula generation, plus financial data connectivity for regulated workflows.
Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.
OpenAI introduced ChatGPT for Excel on March 5, 2026. The feature targets paid ChatGPT users and adds spreadsheet-native analysis and formula generation, plus financial data connectivity for regulated workflows.
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026. According to Google’s official post, the model is launching in preview with low per-token pricing and a speed-focused profile for high-volume developer workloads.