A project post on r/MachineLearning stood out because it did not just propose an alternative attention score; it documented the engineering breakage that follows when dot products disappear.
#attention
RSS FeedLLM Reddit Apr 1, 2026 1 min read
LLM Hacker News Mar 21, 2026 2 min read
The March 20, 2026 HN discussion around Attention Residuals focused on a simple claim with large implications: replace fixed residual addition with learned depth-wise attention and recover performance with modest overhead.
LLM Reddit Mar 18, 2026 2 min read
A Reddit thread surfaced Kimi's AttnRes paper, which argues that fixed residual accumulation in PreNorm LLMs dilutes deeper layers. The proposed attention-based residual path and its block variant aim to keep the gains without exploding memory cost.
LLM Reddit Mar 6, 2026 1 min read
A popular r/MachineLearning discussion examines an unofficial theorem-style claim that Attention’s core optimization geometry is d^2, not n^2. Community response is mixed: strong curiosity, but equally strong calls for peer review and reproducible evidence.