#reinforcement-learning

LLM X/Twitter Jul 25, 2026 1 min read

Nemotron 3 Nano RL Run Raises Math Accuracy From 22% to 91%

NVIDIA says a hosted RL loop lifted Nemotron 3 Nano from 22% to 91% accuracy on a math task for under $5, ending with a downloadable LoRA adapter.

#nvidia #nemotron #reinforcement-learning

AI Jul 24, 2026 1 min read

Google uses RL to make Willow quantum error correction 3.5x steadier

Google Quantum AI used reinforcement learning to steer thousands of control parameters while a quantum system kept running. On Willow, the method improved logical stability 3.5x under injected drift and reduced logical error rate another 20% after expert calibration.

#google-quantum-ai #reinforcement-learning #quantum

LLM X/Twitter Jun 20, 2026 1 min read

OpenAI tests alignment training that survives adversarial pressure

OpenAI’s new alignment work targets durability, not just benchmark behavior. The study trains beneficial traits across 12 domains and tests whether they persist under adversarial prompts and harmful fine-tuning.

#openai #alignment #reinforcement-learning

AI Apr 28, 2026 2 min read

David Silver’s new lab lands a $1.1B seed to chase ‘superlearners’

Investors just placed another billion-dollar bet on an AI path that tries to move beyond human-written data. David Silver’s new lab, Ineffable Intelligence, raised $1.1 billion to pursue reinforcement-learning systems it calls “superlearners.”

#ineffable-intelligence #david-silver #funding

AI X/Twitter Apr 27, 2026 2 min read

David Silver’s Ineffable opens with $1.1B to build “superlearners”

This is material because one of reinforcement learning’s best-known researchers has broken out with one of Europe’s biggest seed rounds instead of another incremental model demo. Reuters says Ineffable opened with $1.1 billion at a $5.1 billion valuation, while the company frames the mission as building “superlearners” from experience rather than human data.

#ineffable-intelligence #david-silver #reinforcement-learning

LLM X/Twitter Apr 22, 2026 1 min read

NVIDIA NeMo RL uses FP8 to speed Qwen3-8B training by 1.48x

Why it matters: post-training agents increasingly depend on reinforcement learning throughput, not only inference speed. NVIDIA says NeMo RL’s FP8 path speeds RL workloads by 1.48x on Qwen3-8B-Base while tracking BF16 accuracy.

#nvidia #nemo-rl #fp8

AI Apr 18, 2026 2 min read

RAD-2 cuts autonomous-driving collisions by 56% in closed-loop tests

RAD-2 reframes diffusion-based driving planners as a generator-discriminator system, then adds reinforcement learning feedback where imitation-only training is weakest. The headline number is a 56% collision-rate drop versus strong diffusion planners, plus reported real-world deployment in complex urban traffic.

#autonomous-driving #reinforcement-learning #diffusion

LLM X/Twitter Apr 5, 2026 2 min read

Cursor details Composer 2’s training stack, from continued pretraining to real-world RL

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer 2 checkpoints every five hours. Cursor’s March 27 technical report says the model combines continued pretraining on Kimi K2.5 with large-scale RL in realistic Cursor sessions, scores 61.3 on CursorBench, and runs on an asynchronous multi-region RL stack with large sandbox fleets.

#cursor #composer-2 #reinforcement-learning

106

LLM X/Twitter Apr 1, 2026 2 min read

Together Research releases Aurora for RL-based adaptive speculative decoding

Together Research said on March 31, 2026 that Aurora is an open-source framework for adaptive speculative decoding that learns from live inference traces and updates the speculator asynchronously without interrupting serving. Together’s blog and paper say Aurora reframes the problem as asynchronous RL and can deliver 1.25x additional speedup over a strong static speculator as traffic shifts.

#together-ai #aurora #speculative-decoding

108

Sciences Hacker News Mar 31, 2026 2 min read

Hacker News Highlights a Continuous-Time Route from RL to Diffusion Models

A March 28 essay on the Hamilton-Jacobi-Bellman equation drew Hacker News attention by showing how continuous-time control theory connects reinforcement learning, optimal control, and diffusion models.

#reinforcement-learning #diffusion-models #optimal-control

AI Reddit Mar 30, 2026 2 min read

r/singularity Highlights Cursor’s Five-Hour Real-Time RL Loop for Composer

A March 29 r/singularity thread amplified Cursor's claim that Composer checkpoints can now be trained from live user interactions and shipped every five hours, with reward-hacking fixes treated as part of the story rather than an afterthought.

#cursor #reinforcement-learning #coding-agents

Humanoid Robots Reddit Mar 18, 2026 2 min read

r/singularity Pushes LATENT as Humanoid Tennis Learns From Five Hours of Imperfect Motion Data

A March 15, 2026 r/singularity post with 3,150 points and 376 comments pushed attention toward LATENT, a humanoid tennis system trained from five hours of imperfect human motion fragments instead of full match-grade capture.

#humanoid-robots #robotics #reinforcement-learning