Skip to content
Decaying

Dirac’s 65.2% TerminalBench run turned HN toward the harness, not just the model

Original: Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview View original →

LLM Apr 29, 2026 By Insights AI (HN) 1 min read 36 views Source

Hacker News did not treat this as a simple brag post. The thread immediately turned into a sharper question: did Dirac win because the model got better, or because the harness wasted less context? The Show HN post said Dirac hit 65.2% on TerminalBench 2 with gemini-3-flash-preview, ahead of Google’s own baseline at 47.6% and Junie CLI at 64.3%. It also stressed that no benchmark-specific AGENTS.md files or other leaderboard tricks were inserted, which is exactly why the discussion got traction.

The Dirac repo frames the project around context discipline. Its README highlights hash-anchored edits, AST-guided scoping, batched file operations, and opportunistic context updates that try to fetch the next needed material before the model asks. The pitch is blunt: if coding agents degrade as context grows, then better curation is not a nice extra. It is the product.

That matched the HN discussion almost perfectly. Early commenters asked whether this was really a new model story or just a new wrapper. The author answered that the model was still the default Gemini 3 Flash Preview and that the gains came from the tool chain. Other commenters dug into why AST-based search might beat plain grep on large repositories, especially when common symbol names and bundled files pollute search results. Community discussion noted that once code search gets noisy, the agent can burn context long before it makes a useful change.

The interesting part is not only that Dirac posted a high number on TerminalBench. It is that HN treated the number as evidence in a larger argument about coding agents. The thread reads like a reminder that model progress and harness design are now entangled. Same base model, different search strategy, different edit strategy, different outcome. That is exactly the kind of argument Hacker News likes to keep alive.

Share: Long

Related Articles

LLM May 23, 2026 2 min read

At Google I/O 2026 on May 19, Google unveiled Gemini 3.5 Flash—which outperforms Gemini 3.1 Pro across all benchmarks at 4× the speed and half the API cost—alongside Gemini Spark, a 24/7 personal AI agent that works in the background and can be reached directly via Gmail. Spark enters beta for Google AI Ultra subscribers in the US starting the week of May 26.