Hacker News가 본 no-training LLM surgery, 레이어 3개 복제로 reasoning을 끌어올린다는 주장

Hacker News thread에서 262 points와 81 comments를 모은 이 story는, fine-tuning 없이 LLM의 특정 layer block을 복제해 reasoning을 끌어올릴 수 있다는 실험을 다룬다. 링크된 저장소는 llm-circuit-finder이며, 작성자는 David Ng의 RYS method를 재현하고 확장했다고 설명한다. 핵심 아이디어는 모델의 contiguous layer block을 한 번 더 통과시키는 modified GGUF를 만들어 hidden states를 같은 circuit에 재진입시키는 것이다.

repo에서 가장 강하게 내세우는 결과는 Devstral-Small-2-24B에 대한 실험이다. 작성자는 layers 12-14를 한 번 더 duplicated path에 넣었을 때 BBH logical deduction이 0.22에서 0.76으로 올라갔다고 주장한다. 같은 설명에는 causal judgement와 GSM8K도 개선됐지만, instruction following과 MBPP는 떨어졌다고 적혀 있다. 즉, 모든 능력이 좋아졌다기보다 reasoning-heavy behavior가 특정 cost를 치르며 이동했다는 서사다.

두 번째 예시는 Qwen2.5-Coder-32B다. repo는 layers 7-9 duplication으로 custom reasoning suite가 76.5%에서 94.1%로, EQ score가 92.1에서 93.6으로 올랐다고 쓴다. 또한 sweep.py, layer_path.py, compare_eval.py 같은 도구를 포함해 layer block 탐색과 benchmark 비교를 재현할 수 있게 한다. 이런 툴링은 단순한 claim 페이지보다 한 단계 나아간 점이지만, 여전히 핵심 benchmark 결과는 repo author가 제시한 self-reported measurements라는 점을 분리해서 봐야 한다.

trade-off도 분명히 공개돼 있다. repo FAQ는 duplicated layers가 physical copies로 GGUF에 들어가기 때문에 extra VRAM이 필요하다고 말한다. 예시로 24B model에 3 layers를 더하면 약 1.5 GiB 정도가 추가될 수 있고, 40-layer model에서 3 extra layers라면 inference가 대략 7.5% 느려질 수 있다고 적는다. 다시 말해 이 접근은 free lunch가 아니라, additional memory와 latency를 reasoning gain으로 교환하는 구조다.

흥미로운 부분은 이 실험이 fine-tuning이나 weight merge 대신 execution path modification에 초점을 둔다는 점이다. 작성자는 “same weights, different routing”이라는 framing으로 transformer 안에 functional circuit가 존재한다고 해석한다. layer boundaries를 잘 맞추면 model이 reasoning pipeline을 한 번 더 통과하는 효과를 얻는다는 주장이다. 만약 이 아이디어가 더 넓은 model family에서 재현된다면, architecture surgery가 quantization이나 fine-tuning과는 다른 최적화 축으로 자리 잡을 여지가 있다.

다만 현 단계에서는 과장 없이 읽는 것이 중요하다. logical deduction 0.22→0.76 같은 수치는 매우 인상적이지만, independent lab evaluation이나 broad cross-model replication이 붙은 결과는 아니다. 그래서 이 story는 이미 검증된 breakthrough라기보다, HN가 주목한 고위험·고흥미 open source experiment에 더 가깝다. 관심 있는 사용자는 HN discussion과 repo를 직접 확인하며, claim과 reproduction 가능성을 함께 살펴보는 편이 맞다.

Hacker News가 본 no-training LLM surgery, 레이어 3개 복제로 reasoning을 끌어올린다는 주장

Related Articles

LocalLLaMA가 주목한 Mamba-3, inference 효율 중심으로 설계된 state space model

HN이 주목한 llm-circuit-finder: layer duplication은 LLM 향상의 지름길인가, capability steering인가

HN 스포트라이트: Sarvam, IndiaAI 기반의 풀스택 전략으로 30B·105B 공개

Comments (0)

Leave a Comment

Related Articles

LocalLLaMA가 주목한 Mamba-3, inference 효율 중심으로 설계된 state space model

HN이 주목한 llm-circuit-finder: layer duplication은 LLM 향상의 지름길인가, capability steering인가

HN 스포트라이트: Sarvam, IndiaAI 기반의 풀스택 전략으로 30B·105B 공개
LLM Hacker News Mar 7, 2026 1 min read