#attention

LLM Reddit Apr 1, 2026 1 min read

What r/MachineLearning is actually discussing in the RBF-Attention experiment

A project post on r/MachineLearning stood out because it did not just propose an alternative attention score; it documented the engineering breakage that follows when dot products disappear.

#transformers #attention #rbf

LLM Hacker News Mar 21, 2026 2 min read

Hacker News Tracks Moonshot AI’s Attention Residuals as a Drop-In Upgrade for Transformer Depth

The March 20, 2026 HN discussion around Attention Residuals focused on a simple claim with large implications: replace fixed residual addition with learned depth-wise attention and recover performance with modest overhead.

#llm #transformers #research

LLM Reddit Mar 18, 2026 2 min read

r/MachineLearning highlights Attention Residuals as Kimi targets fixed-sum PreNorm bottlenecks

A Reddit thread surfaced Kimi's AttnRes paper, which argues that fixed residual accumulation in PreNorm LLMs dilutes deeper layers. The proposed attention-based residual path and its block variant aim to keep the gains without exploding memory cost.

#kimi #llm-architecture #attention

LLM Reddit Mar 6, 2026 1 min read

Reddit Debate: Is Attention fundamentally a d^2 problem rather than n^2?

A popular r/MachineLearning discussion examines an unofficial theorem-style claim that Attention’s core optimization geometry is d^2, not n^2. Community response is mixed: strong curiosity, but equally strong calls for peer review and reproducible evidence.

#attention #transformers #ml-theory