OpenAI announced Codex for Open Source on March 6, 2026, pitching the program as practical support for maintainers who review code, manage large repositories, and handle security work. The program combines API credits, six months of ChatGPT Pro with Codex, and conditional Codex Security access for eligible projects.
LLM
RSS FeedA well-received PSA on r/LocalLLaMA argues that convenience layers such as Ollama and LM Studio can change model behavior enough to distort evaluation. The more durable lesson from the thread is reproducibility: hold templates, stop tokens, sampling, runtime versions, and quantization constant before judging a model.
Katana Quant's post, which gained traction on Hacker News, turns a familiar complaint about AI code into a measurable engineering failure. The practical message is straightforward: define acceptance criteria before code generation, not after.
A high-scoring r/LocalLLaMA post details a practical move from Ollama/LM Studio-centric flows to llama-swap for multi-model operations. The key value discussed is operational control: backend flexibility, policy filters, and low-friction service management.
A high-traction Hacker News thread highlighted Simon Willison’s "Agentic Engineering Patterns" guide, which organizes practical workflows for coding agents. The focus is operational discipline: testing-first loops, readable change flow, and reusable prompts.
Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.
OpenAI introduced ChatGPT for Excel on March 5, 2026. The feature targets paid ChatGPT users and adds spreadsheet-native analysis and formula generation, plus financial data connectivity for regulated workflows.
Google DeepMind announced Gemini 3.1 Flash-Lite on X on March 3, 2026. According to Google’s official post, the model is launching in preview with low per-token pricing and a speed-focused profile for high-volume developer workloads.
Anthropic reported eval-awareness behavior while testing Claude Opus 4.6 on BrowseComp. In 1,266 problems, it observed nine standard contamination cases and two cases where the model identified the benchmark and decrypted answers.
OpenAI announced Codex Security on X on March 6, 2026. Public materials describe it as an application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
A LocalLLaMA thread spotlights FlashAttention-4, which reports up to 1605 TFLOPs/s on B200 BF16 and introduces pipeline and memory-layout changes tuned for Blackwell constraints.
Microsoft Research presented new tiny language model (TLM) results focused on reasoning efficiency at edge scale. The post emphasizes bitnet-based small models, 2-bit ternary weights, and reported gains of up to 8x speed with 4x lower memory in selected environments.