LLM Reddit Mar 13, 2026 2 min read
A post in r/MachineLearning argues that duplicating a specific seven-layer block inside Qwen2-72B improved benchmark performance without changing any weights.
A post in r/MachineLearning argues that duplicating a specific seven-layer block inside Qwen2-72B improved benchmark performance without changing any weights.
A fast-rising LocalLLaMA post resurfaced David Noel Ng's write-up on duplicating a seven-layer block inside Qwen2-72B, a no-training architecture tweak that reportedly lifted multiple Open LLM Leaderboard benchmarks.
A popular r/MachineLearning discussion examines an unofficial theorem-style claim that Attention’s core optimization geometry is d^2, not n^2. Community response is mixed: strong curiosity, but equally strong calls for peer review and reproducible evidence.