LLM Coding Performance: Harness Design, Not Models, Is the Key

Overview

Can Bölük demonstrated that edit tool (harness) design, not model selection, is the primary bottleneck in LLM coding performance. Testing 16 models across 180 React codebase tasks revealed that changing only the edit approach produces dramatic improvements.

Problems with Existing Edit Approaches

Patch format (OpenAI/Codex): Uses diff-style strings but fails catastrophically for non-GPT models. Grok 4's failure rate reached 50.7%.

String replacement (Claude Code): Requires exact character matching including whitespace, generating frequent "String to replace not found" errors.

Neural merging (Cursor): Fine-tuned a separate model solely to fix edit failures, acknowledging the problem's severity.

The Hashline Solution

The author proposes tagging each line with content hashes. Models reference hash tags rather than reproducing text. This approach:

Prevents corruption if files change between reads
Eliminates whitespace reproduction requirements
Shows that models aren't flaky at understanding tasks, but at expressing themselves

Benchmark Results

Grok Code Fast improved from 6.7% to 68.3% success rate—a tenfold gain. This proves "the model isn't flaky at understanding the task. It's flaky at expressing itself."

Key Takeaway

Open-source harness development benefits all models, while vendor-specific optimization creates isolated silos, ultimately hindering ecosystem progress. The highest-leverage innovation point right now is not model improvement, but harness design.

AI sources.twitter 1d ago 2 min read

GitHub Copilot rolls out GPT-5.5 for complex agentic coding

Why it matters: model launches become more consequential when they land in tools developers already use every day. GitHub says early testing found GPT-5.5 strongest on complex multi-step coding tasks, and the rollout ships with a 7.5x premium request multiplier.

#github #copilot #gpt-5-5

AI sources.twitter 1d ago 2 min read

DeepSeek-V4 opens 1M context with 1.6T/49B and 284B/13B split

Why it matters: open models rarely arrive with both giant context claims and deployable model splits. DeepSeek put hard numbers on the release with a 1M-context design, a 1.6T/49B Pro model, and a 284B/13B Flash variant.

#deepseek #open-weights #llm

AI Hacker News Mar 28, 2026 2 min read

Hacker News Debates Reco's AI Rewrite of JSONata After the Team Claims a $500K Infrastructure Win

A March 25, 2026 Hacker News post about Reco's `gnata` rewrite reached 256 points and 237 comments at crawl time. Reco says AI-assisted porting of JSONata 2.x to Go took about 7 hours and $400 in tokens, then removed an RPC-heavy Node fleet and eventually cut roughly $500,000 per year in infrastructure cost.

#ai-coding #jsonata #go