Reddit ML Spotlight: AdderBoard Pushes Tiny Transformer Addition Challenge Below 100 Parameters
Original: [R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy View original →
Community signal from r/MachineLearning
On February 28, 2026, a r/MachineLearning post titled “[R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy” reached 74 upvotes and 33 comments at crawl time. The thread points to AdderBoard, an open challenge focused on one precise target: find the smallest autoregressive transformer that can reliably add two 10-digit integers.
What the benchmark asks for
According to the repository, participants must achieve at least 99% accuracy on a held-out 10,000-sample test set. The problem is intentionally narrow, but it stresses core transformer behavior: digit alignment, per-digit arithmetic, and carry propagation across positions. This makes it a compact testbed for architecture efficiency and inductive bias, not just a leaderboard game.
Why the current results are notable
The project tracks two categories: trained weights and hand-coded weights. In trained settings, the top reported entry is 311 parameters at 99.999% accuracy, while several sub-1,000-parameter models exceed 99%. In hand-coded constructive setups, the board lists solutions as low as 36 parameters with 100% reported accuracy. The README also documents earlier baseline outputs from coding-agent workflows, where one run produced 6,080 parameters and another 1,644, before community optimization pushed much lower.
Methodology discipline is part of the story
AdderBoard explicitly requires genuine autoregressive transformer behavior with self-attention in the model and generic decoding outside the model. The rules reject solution patterns where task-specific arithmetic logic is hidden in inference code. Verification guidance includes edge cases, random evaluation with fixed seed, and transparent parameter counting conventions. That framing is useful for teams that care about reproducibility, not only impressive numbers.
Why engineers should care
The broader implication is that carefully constrained tasks can expose where transformer capacity is truly needed and where it is over-provisioned. Even if this challenge does not map directly to production LLM workloads, it offers practical insight into compression, architecture search, and evaluation rigor. The Reddit thread surfaced it as a high-signal research toy problem with clear rules and reusable test infrastructure.
Related Articles
A r/MachineLearning post surfaced AdderBoard, where community submissions report 100% 10-digit addition with extremely small transformer designs, including hand-coded models under 100 parameters.
A fast-rising LocalLLaMA post resurfaced David Noel Ng's write-up on duplicating a seven-layer block inside Qwen2-72B, a no-training architecture tweak that reportedly lifted multiple Open LLM Leaderboard benchmarks.
Google AI Developers has released Android Bench, an official leaderboard for LLMs on Android development tasks. In the first results, Gemini 3.1 Pro ranks first, and Google is also publishing the benchmark, dataset, and test harness.
Comments (0)
No comments yet. Be the first to comment!