LLM

LLM X/Twitter Mar 7, 2026 2 min read

OpenAI launches Codex Security research preview for validated code vulnerability remediation

OpenAI Developers said on March 6, 2026 that Codex Security is now in research preview. The product connects to GitHub repositories, builds a threat model, validates potential issues in isolation, and proposes patches for human review.

#openai #codex #security

LLM Hacker News Mar 7, 2026 2 min read

HN Debate: OBLITERATUS Packages Refusal Editing as a Public LLM Research Tool

A Hacker News thread surfaced OBLITERATUS, an open-source project that studies and alters refusal behavior in open-weight LLMs without retraining. The interesting part is not just the capability claim but the project’s framing as a shared telemetry-backed research pipeline for comparing safety-editing methods across models and hardware.

#open-weight #llm-safety #mechanistic-interpretability

LLM Hacker News Mar 7, 2026 2 min read

HN Spotlight: Sarvam Open-Sources 30B and 105B in a Full-Stack IndiaAI Push

A well-received HN post highlighted Sarvam AI’s decision to open-source Sarvam 30B and 105B, two reasoning-focused MoE models trained in India under the IndiaAI mission. The announcement matters because it pairs open weights with concrete product deployment, inference optimization, and unusually strong Indian-language benchmarks.

#sarvam #open-source #llm

LLM X/Twitter Mar 7, 2026 2 min read

GitHub starts rolling GPT-5.4 into Copilot for VS Code and Copilot CLI

GitHub said on March 5, 2026 that GPT-5.4 is now generally available and rolling out in GitHub Copilot. The company claims early testing showed higher success rates plus stronger logical reasoning and task execution on complex, tool-dependent developer workflows.

#github #copilot #gpt-5-4

LLM X/Twitter Mar 7, 2026 2 min read

Anthropic says Claude Opus 4.6 found 22 Firefox vulnerabilities in a Mozilla collaboration

Anthropic said on March 6, 2026 that Claude Opus 4.6 uncovered 22 Firefox vulnerabilities in two weeks, including 14 high-severity issues, during a collaboration with Mozilla. The accompanying write-up argues that frontier models are becoming materially useful for real vulnerability discovery, not just benchmark performance.

#anthropic #mozilla #firefox

LLM X/Twitter Mar 7, 2026 2 min read

OpenAI launches Codex for Open Source to fund maintainer workflows and security work

OpenAI announced Codex for Open Source on March 6, 2026, pitching the program as practical support for maintainers who review code, manage large repositories, and handle security work. The program combines API credits, six months of ChatGPT Pro with Codex, and conditional Codex Security access for eligible projects.

#openai #codex #open-source

LLM Reddit Mar 7, 2026 2 min read

LocalLLaMA PSA: Test New Models on Base Runtimes Before Convenience Layers

A well-received PSA on r/LocalLLaMA argues that convenience layers such as Ollama and LM Studio can change model behavior enough to distort evaluation. The more durable lesson from the thread is reproducibility: hold templates, stop tokens, sampling, runtime versions, and quantization constant before judging a model.

#local-llm #model-evaluation #llama-cpp

LLM Hacker News Mar 7, 2026 2 min read

HN Debate: LLM Coding Works Better When Acceptance Criteria Come First

Katana Quant's post, which gained traction on Hacker News, turns a familiar complaint about AI code into a measurable engineering failure. The practical message is straightforward: define acceptance criteria before code generation, not after.

#llm #ai-coding #software-quality

LLM Reddit Mar 7, 2026 2 min read

Reddit Field Report: How LocalLLaMA Users Are Operationalizing Multi-Model Serving with llama-swap

A high-scoring r/LocalLLaMA post details a practical move from Ollama/LM Studio-centric flows to llama-swap for multi-model operations. The key value discussed is operational control: backend flexibility, policy filters, and low-friction service management.

#local-llm #model-serving #llama-swap

LLM Hacker News Mar 7, 2026 1 min read

From Prompt Tricks to Process: HN Spotlights Agentic Engineering Patterns

A high-traction Hacker News thread highlighted Simon Willison’s "Agentic Engineering Patterns" guide, which organizes practical workflows for coding agents. The focus is operational discipline: testing-first loops, readable change flow, and reusable prompts.

#agentic-engineering #coding-agents #software-quality

LLM Mar 6, 2026 1 min read

Microsoft Research Introduces CORPGEN for Multi-Task Enterprise Agents

Microsoft Research introduced CORPGEN on February 26, 2026 to evaluate and improve agent performance in realistic multi-task office scenarios. The framework reports up to 3.5x higher task completion than baseline systems under heavy concurrent load.

#microsoft #agents #corpgen

LLM Mar 6, 2026 1 min read

OpenAI Launches ChatGPT for Excel With New Financial Data Integrations

OpenAI introduced ChatGPT for Excel on March 5, 2026. The feature targets paid ChatGPT users and adds spreadsheet-native analysis and formula generation, plus financial data connectivity for regulated workflows.

#openai #chatgpt #excel