Reddit Debate: Is Attention fundamentally a d^2 problem rather than n^2?
Original: [D] A mathematical proof from an anonymous Korean forum: The essence of Attention is fundamentally a d^2 problem, not n^2. (PDF included) View original →
What was posted
A high-engagement post on r/MachineLearning shared an anonymously authored PDF from a Korean community and framed it as a mathematical argument about Attention complexity. The post argues that when forward and backward dynamics are considered together, the optimization landscape explored by parameters is fundamentally d^2-dimensional rather than n^2. It also suggests this perspective could motivate alternatives to standard softmax attention.
The source thread is here: r/MachineLearning discussion. At crawl time, it had strong visibility with substantial comments, making it relevant as a community signal even though the claim itself remains unverified.
Main claims discussed in the thread
- Attention optimization geometry should be interpreted through a
d^2lens. - Softmax may preserve matching behavior but contributes to an expensive scaling pattern.
- A polynomial-style alternative might keep useful structure while changing complexity tradeoffs.
Commenters quickly moved from hype to critique. Several top responses said the derivation may be interesting as theory framing, but that equal optimization dimensionality does not prove functional equivalence between kernels. Others noted that complexity comparisons such as O(nd^3) versus O(n^2d) depend heavily on practical ranges of d, sequence lengths, and hardware behavior.
Why this still matters
Even if the theorem-level claim does not hold under review, the thread highlights a valuable pattern: ML communities are actively stress-testing how we describe Attention bottlenecks. That matters for model design, inference engineering, and benchmark interpretation. In practice, the right takeaway is not “replace Transformers now,” but “separate geometric insight from deployment evidence.”
For practitioners, a sensible evaluation checklist is: verify reproducible code, compare against established linear/hybrid attention baselines, track wall-clock and memory behavior in addition to asymptotic notation, and require independent peer validation before architectural conclusions.
Source: Reddit post
Related Articles
A fast-rising LocalLLaMA post resurfaced David Noel Ng's write-up on duplicating a seven-layer block inside Qwen2-72B, a no-training architecture tweak that reportedly lifted multiple Open LLM Leaderboard benchmarks.
A post in r/MachineLearning argues that duplicating a specific seven-layer block inside Qwen2-72B improved benchmark performance without changing any weights.
OpenAI introduced ChatGPT for Excel on March 5, 2026. The feature targets paid ChatGPT users and adds spreadsheet-native analysis and formula generation, plus financial data connectivity for regulated workflows.
Comments (0)
No comments yet. Be the first to comment!