Hacker News debates a no-training LLM trick that duplicates layers to improve reasoning
Original: Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training View original →
A Hacker News thread that reached 262 points and 81 comments focused on an unusual claim: reasoning gains from duplicating a small block of LLM layers without training. The linked repository, llm-circuit-finder, says it builds on David Ng’s RYS method and searches for contiguous transformer layers that behave like reusable reasoning circuits. Instead of changing weights, the toolkit modifies the execution path so hidden states pass through the same block twice.
The headline result in the repo is for Devstral-Small-2-24B. The author says duplicating layers 12-14 raises BBH logical deduction from 0.22 to 0.76. The same write-up says causal judgement and GSM8K also improve, while instruction following and MBPP decline. That tradeoff is important: the project is not claiming a universal quality increase. It is claiming that one narrow architectural intervention can strengthen some reasoning tasks while weakening precision on others.
The repo also reports results for Qwen2.5-Coder-32B. In that case, duplicating layers 7-9 is said to lift a custom reasoning suite from 76.5% to 94.1%, while EQ rises from 92.1 to 93.6. The toolkit includes sweep.py, layer_path.py, compare_eval.py, and related utilities so users can search for useful layer blocks, build modified GGUF files, and compare evaluation runs. That makes the project more reproducible than a one-off demo, but the benchmark numbers still come from the project author rather than from an external lab.
The costs are also spelled out clearly. Because the duplicated layers are physical GGUF copies, the model needs more memory. The repo says 3 extra layers on a 24B model can add about 1.5 GiB of VRAM, and that 3 extra layers on a 40-layer model can slow inference by roughly 7.5%. In other words, this is not a free reasoning boost. It trades additional memory and latency for a second pass through a candidate circuit.
What makes the experiment interesting is its framing. The author argues that some transformer blocks act as indivisible cognitive units, and that repeating the right block changes model behavior even when the weights stay fixed. If that interpretation holds up across more architectures, execution-path surgery could become a distinct optimization axis alongside fine-tuning, quantization, and decoding changes. It is a different way of asking what useful behavior is already latent in a pretrained model.
For now, though, the safest reading is experimental rather than definitive. Numbers like 0.22→0.76 on logical deduction are striking, but they are still repo-author claims and not broad independent verification. That is exactly the kind of high-variance result that Hacker News likes to surface early. Readers who want to judge the idea seriously should look at both the HN thread and the repository, then treat the project as a reproducible research experiment rather than a settled breakthrough.
Related Articles
Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.
Alibaba launched Qwen3.5, a 397B-parameter open-weight multimodal model supporting 201 languages. The company claims it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 on benchmarks, while costing 60% less than its predecessor.
A well-received HN post highlighted Sarvam AI’s decision to open-source Sarvam 30B and 105B, two reasoning-focused MoE models trained in India under the IndiaAI mission. The announcement matters because it pairs open weights with concrete product deployment, inference optimization, and unusually strong Indian-language benchmarks.
Comments (0)
No comments yet. Be the first to comment!