HN Examines llm-circuit-finder: Layer Duplication as Capability Steering, Not a Free LLM Upgrade

What the source material claims

llm-circuit-finder argues that some transformer capabilities live in small contiguous reasoning circuits. Instead of changing weights or training adapters, the project duplicates selected layers in the forward path so hidden states traverse the same block twice. The Show HN post says the author replicated David Ng's RYS method on consumer AMD GPUs, specifically RX 7900 XT + RX 6950 XT, and found strong effects in Devstral-24B and Qwen2.5-Coder-32B.

Devstral-24B with layers 12-14 duplicated once: BBH Logical Deduction 0.22 to 0.76, GSM8K strict 0.48 to 0.64, MBPP 0.72 to 0.78 in the HN summary.
Qwen2.5-Coder-32B with layers 7-9 duplicated once: reasoning probe 76% to 94%.
The repo ships sweep.py, layer_path.py, gguf_surgery.py, compare_eval.py, and visualize.py for search, GGUF editing, evaluation comparison, and visualization.

On its face, this is a compelling idea. The README says different duplication patterns can create different cognitive modes from the same weights, and that circuit boundaries are sharp enough that moving the duplicated block by one layer can erase or invert the effect. That frames the project less as ordinary fine-tuning and more as explicit routing control over a fixed model.

Where the skepticism starts

The important nuance is that the README is more careful than the HN headline. What the project appears to show is capability steering, not universal across-the-board improvement. Some reasoning-heavy tasks improve, but other capabilities can weaken. That distinction matters because the HN submission says Nothing degraded, while the repo's broader evidence does not support reading the result as a free win.

The clearest example is the full benchmark table for Devstral surgery. The highlighted HN metrics emphasize reasoning gains, but the README's broader comparison shows weaker IFEval/MBPP and a lower average across all listed metrics, moving from 0.7610 to 0.7488. In other words, the project may make one capability profile better while making another worse. For practitioners, that is a meaningful tradeoff, not a detail to bury below the headline.

There is also a practical cost. The conceptual pitch is same weights, no training, different routing, but the current implementation physically duplicates layers inside GGUF files. The README says 3 extra layers on a 24B model cost about 1.5 GiB extra VRAM and about 7.5% slower inference. So even if the mechanism avoids weight updates, it is not operationally free in memory or latency.

Why the HN thread mattered

This was a community-sourced story in the best sense: the GitHub repo provided the raw claim, and the Hacker News thread pressure-tested it. The post drew 257 points and 82 comments, which turned the discussion into more than a link share. Commenters challenged the novelty of layer duplication, pointed to prior art in layer replay, and asked what was genuinely new beyond David Ng's earlier work. The author's answer was that the new contribution is a sweep-and-validation toolkit plus benchmark evidence that exact 3-layer boundaries can matter for specific models.

That exchange is what makes the story useful for practitioners. If the real takeaway is not all models get better, but certain routes steer certain behaviors, then evaluation discipline becomes the core issue. Teams would need to ask whether the same effect holds across seeds, prompts, quantizations, runtimes, and downstream tuning, and whether the capability gained is worth the average performance lost elsewhere.

The project is therefore best read as an interesting model-surgery experiment with reproducible scripts and concrete deltas, not as proof that duplicated layers universally upgrade LLMs. The evidence so far points to capability steering via layer routing, with measurable tradeoffs in benchmark coverage and runtime cost.

Source: the original repository is https://github.com/alainnothere/llm-circuit-finder and the Hacker News discussion is https://news.ycombinator.com/item?id=47431671.

HN Examines llm-circuit-finder: Layer Duplication as Capability Steering, Not a Free LLM Upgrade

What the source material claims

Where the skepticism starts

Why the HN thread mattered

Related Articles

Google DeepMind Releases Gemini 3.1 Pro: 2x Reasoning Boost and Record Benchmark Scores

Google DeepMind Launches Gemini 3.1 Pro for Complex Reasoning Workloads

Gemini 3.1 Pro Launches as Google Targets Complex Reasoning Work

Related Articles

Google DeepMind Releases Gemini 3.1 Pro: 2x Reasoning Boost and Record Benchmark Scores
LLM X/Twitter Feb 22, 2026 1 min read

Google DeepMind Launches Gemini 3.1 Pro for Complex Reasoning Workloads
LLM Feb 28, 2026 2 min read

Gemini 3.1 Pro Launches as Google Targets Complex Reasoning Work
LLM Hacker News Feb 20, 2026 2 min read