OmniCoder-9B packages agent-style coding behavior into a smaller open model by training on more than 425,000 curated trajectories from real tool-using workflows.
#coding-agents
METR's March 10, 2026 note argues that about half of test-passing SWE-bench Verified PRs from recent agents would still be rejected by maintainers. HN treated it as a warning that benchmark wins do not yet measure scope control, code quality, or repo fit.
Hacker News elevated Bassim Eledath’s eight-level framework, responding to an article that explains coding-agent performance gaps through workflow maturity instead of model benchmarks.
A front-page Hacker News thread drew attention to SWE-CI, an arXiv benchmark that evaluates coding agents on 100 real repository evolution tasks rather than one-shot bug fixes. The paper frames software maintainability as a CI-loop problem and reports that even strong models still struggle to avoid regressions over long development arcs.
A high-traction Hacker News thread highlighted Simon Willison’s "Agentic Engineering Patterns" guide, which organizes practical workflows for coding agents. The focus is operational discipline: testing-first loops, readable change flow, and reusable prompts.
A LocalLLaMA post reports that a simple “verify after every edit” loop raised Qwen3.5-35B-A3B from 22.2% to 37.8% on SWE-bench Verified Hard, approaching a cited 40% reference for Claude Opus 4.6.
Software developer Manuel Schipper shares a practical workflow for running 4-8 parallel AI coding agents simultaneously using tmux, Markdown Feature Design files, and slash commands — no orchestrators required.
Software developer Manuel Schipper shares a practical workflow for running 4-8 parallel AI coding agents simultaneously using tmux, Markdown Feature Design files, and slash commands — no orchestrators required.
AI researcher Andrej Karpathy argues that programming has fundamentally changed over the last two months, particularly since December when coding agents started actually working. Developers are shifting from writing code to directing and managing AI agents in parallel.
OpenAIDevs posted on 2026-02-24 that GPT-5.3-Codex is now available for all developers in the Responses API. The announcement moves API access from a staged rollout to general developer availability.
A Hacker News discussion highlights arXiv:2602.11988, which finds that repository context files like AGENTS.md often reduced coding-agent task success while increasing inference cost by more than 20%.
A LocalLLaMA discussion of SWE-rebench January runs reports close top-tier results, with Claude Code leading pass@1 and pass@5 while open models narrow the gap.