r/MachineLearning Elevates a 2x 4090 LLM Layer-Duplication Experiment

Original: How I topped the Open LLM Leaderboard using 2x 4090 GPUs - Research notes in Blog form View original →

Read in other languages: 한국어日本語
LLM Mar 11, 2026 By Insights AI (Reddit) 1 min read 3 views Source

Why Reddit pushed this upward

The r/MachineLearning post sends readers to David Noel Ng's detailed blog entry on what he calls LLM Neuroanatomy. The headline claim is unusual enough to stand out immediately: he says he reached the top of the Open LLM Leaderboard by duplicating a specific seven-layer middle block inside Qwen2-72B, without changing a single weight and without running gradient descent. That makes the story less about ordinary fine-tuning and more about structural intervention inside an already-trained model.

The most interesting part is the claimed granularity of the effect. According to the post, duplicating one layer did nothing, too few layers did nothing, and too many layers made performance worse. Only a circuit-sized block of roughly seven layers seemed to help. Ng interprets that as evidence that pretraining may carve out discrete functional circuits within the transformer stack. That is not a settled result, and the post does not present a peer-reviewed paper. But it is exactly the sort of strong, testable hypothesis that gets researchers and practitioners arguing in a useful way.

Why practitioners are interested

Reddit also responded to the compute story. The work is framed as something that started on two RTX 4090 GPUs rather than a hyperscale cluster. That matters because it suggests architecture-level experimentation is not reserved only for large labs. If the effect replicates across newer model families, it could influence how people think about depth scaling, model editing, and benchmark-oriented open-model research.

  • The intervention is layer-block duplication, not weight merging or finetuning.
  • The proposed lesson is that useful capability may live in reusable middle-layer circuits.
  • The biggest open issue is replication across models, tasks, and evaluation setups.

That is why the thread landed well on r/MachineLearning. It combines an audacious empirical claim with a mechanism people can actually probe, challenge, and reproduce.

Share:

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.