r/MachineLearning Didn't Buy the Hype Around Rose, but It Did Find the Idea Interesting
Original: [New Optimizer] 🌹 Rose: low VRAM, easy to use, great results, Apache 2.0 [P] View original →
r/MachineLearning did not hand out easy applause, which is exactly why the Rose thread mattered. The post introduced Rose, short for Range-Of-Slice Equilibration, as a new PyTorch optimizer built around range-normalized gradient updates. The pitch was simple and attractive: zero optimizer state, lower VRAM use than Adam-style methods, Apache 2.0 licensing, and enough practical performance to be worth trying outside toy demos.
The project itself gives the idea more substance than the Reddit title suggests. Rose normalizes gradient tensors by per-slice range instead of keeping running first- and second-moment buffers, and adds optional gradient centralization plus a coefficient-of-variation trust gate. The README argues that this makes the method easier to reason about and cheaper to store, because it avoids momentum, variance estimates, and even step counters.
But the subreddit did what it always does when someone arrives with bold optimizer claims: it asked for evidence before vibes. One of the top comments called out the absence of the update rule in the post. Others pushed on the benchmark choice, noting that an MNIST comparison against AdamW on a single seed says very little about whether a new optimizer is broadly useful. Questions about harder tasks, multi-seed significance, comparisons with Muon, and the need for a cleaner paper-style evaluation came quickly.
That skepticism did not kill the discussion; it gave the thread shape. Rose looks interesting precisely because it is not just another renamed Adam variant, and a stateless adaptive optimizer is a real enough idea to make researchers pause. But r/MachineLearning is unwilling to let low VRAM and a wall of logs stand in for methodology. The community response was basically an invitation with conditions: bring clearer theory, stronger experiments, and tasks that stretch beyond MNIST, then people will take the claim seriously. The original thread is on r/MachineLearning, and the project README is on GitHub.
Related Articles
HN did not read Google’s TorchTPU post as another cloud pitch. The real question in the thread was whether a PyTorch user can really switch to `tpu` without falling back into the old PyTorch/XLA pain cave.
Why it matters: model launches live or die on serving and training support, not just weights. LMSYS says its Day-0 stack reached 199 tok/s on B200 and 266 tok/s on H200, while staying strong out to 900K context.
A March 15, 2026 r/MachineLearning post introduced preflight, a lightweight PyTorch validator that reached 56 points and 13 comments by promising a fast pre-training gate for label leakage, NaNs, channel order, dead gradients, class imbalance, and VRAM risk.
Comments (0)
No comments yet. Be the first to comment!