Hacker News surfaced ATTN/11, a project that trains a single-layer, single-head Transformer in PDP-11 assembly on a PDP-11/34A. The README says careful fixed-point math, per-layer learning rates, and a 32KB memory budget cut training from multi-hour estimates to a 5.5-minute run that reaches 10/10 accuracy on digit reversal.
#transformer
RSS FeedResearchers have demonstrated that transformer models with fewer than 100 parameters can add two 10-digit numbers with 100% accuracy using digit tokenization, challenging assumptions about the minimum complexity needed for arithmetic reasoning.
growingSWE has created an interactive walkthrough of Andrej Karpathy's 200-line pure Python GPT implementation, letting you tokenize names, watch softmax convert scores to probabilities, step through backpropagation, and explore attention heatmaps.
A Hacker News thread with score 732 and 120 comments highlighted <code>microgpt</code>, Andrej Karpathy’s single-file educational implementation of a GPT-style model. The project packages dataset handling, tokenization, autograd, Transformer layers, Adam optimization, and sampling into one compact Python script.
Google DeepMind introduced D4RT, a single model framework for dynamic 4D scene reconstruction and tracking. The company reports up to 300x efficiency gains versus prior methods, highlighting real-time potential for robotics and AR workloads.