Hacker News debates a no-training LLM trick that duplicates layers to improve reasoning

A Hacker News thread that reached 262 points and 81 comments focused on an unusual claim: reasoning gains from duplicating a small block of LLM layers without training. The linked repository, llm-circuit-finder, says it builds on David Ng’s RYS method and searches for contiguous transformer layers that behave like reusable reasoning circuits. Instead of changing weights, the toolkit modifies the execution path so hidden states pass through the same block twice.

The headline result in the repo is for Devstral-Small-2-24B. The author says duplicating layers 12-14 raises BBH logical deduction from 0.22 to 0.76. The same write-up says causal judgement and GSM8K also improve, while instruction following and MBPP decline. That tradeoff is important: the project is not claiming a universal quality increase. It is claiming that one narrow architectural intervention can strengthen some reasoning tasks while weakening precision on others.

The repo also reports results for Qwen2.5-Coder-32B. In that case, duplicating layers 7-9 is said to lift a custom reasoning suite from 76.5% to 94.1%, while EQ rises from 92.1 to 93.6. The toolkit includes sweep.py, layer_path.py, compare_eval.py, and related utilities so users can search for useful layer blocks, build modified GGUF files, and compare evaluation runs. That makes the project more reproducible than a one-off demo, but the benchmark numbers still come from the project author rather than from an external lab.

The costs are also spelled out clearly. Because the duplicated layers are physical GGUF copies, the model needs more memory. The repo says 3 extra layers on a 24B model can add about 1.5 GiB of VRAM, and that 3 extra layers on a 40-layer model can slow inference by roughly 7.5%. In other words, this is not a free reasoning boost. It trades additional memory and latency for a second pass through a candidate circuit.

What makes the experiment interesting is its framing. The author argues that some transformer blocks act as indivisible cognitive units, and that repeating the right block changes model behavior even when the weights stay fixed. If that interpretation holds up across more architectures, execution-path surgery could become a distinct optimization axis alongside fine-tuning, quantization, and decoding changes. It is a different way of asking what useful behavior is already latent in a pretrained model.

For now, though, the safest reading is experimental rather than definitive. Numbers like 0.22→0.76 on logical deduction are striking, but they are still repo-author claims and not broad independent verification. That is exactly the kind of high-variance result that Hacker News likes to surface early. Readers who want to judge the idea seriously should look at both the HN thread and the repository, then treat the project as a reproducible research experiment rather than a settled breakthrough.

Hacker News debates a no-training LLM trick that duplicates layers to improve reasoning

Related Articles

Google DeepMind Releases Gemini 3.1 Pro: 2x Reasoning Boost and Record Benchmark Scores

HN Spotlight: Sarvam Open-Sources 30B and 105B in a Full-Stack IndiaAI Push

Alibaba Releases Qwen3.5: Open-Weight MoE Model Claims to Beat US Rivals

Related Articles

Google DeepMind Releases Gemini 3.1 Pro: 2x Reasoning Boost and Record Benchmark Scores
LLM X/Twitter Feb 22, 2026 1 min read

HN Spotlight: Sarvam Open-Sources 30B and 105B in a Full-Stack IndiaAI Push
LLM Hacker News Mar 7, 2026 2 min read

Alibaba Releases Qwen3.5: Open-Weight MoE Model Claims to Beat US Rivals
LLM Feb 23, 2026 1 min read