#reinforcement-learning

AI 20h ago 1 min read

AlphaGo Creator David Silver Raises Record $1.1B for AI Lab Ineffable Intelligence

David Silver, creator of AlphaGo and AlphaZero, has raised a record $1.1 billion seed round for Ineffable Intelligence — the largest ever in Europe. The startup aims to build superintelligence using reinforcement learning alone, with no human-generated data.

#funding #deepmind #reinforcement-learning

AI Reddit 3d ago 2 min read

Singularity is reading David Silver’s $1.1B raise as a post-LLM swing

r/singularity upvoted the round less because of venture spectacle and more because David Silver’s name still means AlphaZero-era reinforcement learning. The discussion centered on whether a “superlearner” trained without human data could become a genuinely different path from today’s web-trained LLM stack.

#deepmind #david-silver #reinforcement-learning

AI 5d ago 2 min read

David Silver’s new lab lands a $1.1B seed to chase ‘superlearners’

Investors just placed another billion-dollar bet on an AI path that tries to move beyond human-written data. David Silver’s new lab, Ineffable Intelligence, raised $1.1 billion to pursue reinforcement-learning systems it calls “superlearners.”

#ineffable-intelligence #david-silver #funding

AI X/Twitter 5d ago 2 min read

David Silver’s Ineffable opens with $1.1B to build “superlearners”

This is material because one of reinforcement learning’s best-known researchers has broken out with one of Europe’s biggest seed rounds instead of another incremental model demo. Reuters says Ineffable opened with $1.1 billion at a $5.1 billion valuation, while the company frames the mission as building “superlearners” from experience rather than human data.

#ineffable-intelligence #david-silver #reinforcement-learning

LLM X/Twitter Apr 22, 2026 1 min read

NVIDIA NeMo RL uses FP8 to speed Qwen3-8B training by 1.48x

Why it matters: post-training agents increasingly depend on reinforcement learning throughput, not only inference speed. NVIDIA says NeMo RL’s FP8 path speeds RL workloads by 1.48x on Qwen3-8B-Base while tracking BF16 accuracy.

#nvidia #nemo-rl #fp8

AI Apr 18, 2026 2 min read

RAD-2 cuts autonomous-driving collisions by 56% in closed-loop tests

RAD-2 reframes diffusion-based driving planners as a generator-discriminator system, then adds reinforcement learning feedback where imitation-only training is weakest. The headline number is a 56% collision-rate drop versus strong diffusion planners, plus reported real-world deployment in complex urban traffic.

#autonomous-driving #reinforcement-learning #diffusion

LLM X/Twitter Apr 5, 2026 2 min read

Cursor details Composer 2’s training stack, from continued pretraining to real-world RL

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer 2 checkpoints every five hours. Cursor’s March 27 technical report says the model combines continued pretraining on Kimi K2.5 with large-scale RL in realistic Cursor sessions, scores 61.3 on CursorBench, and runs on an asynchronous multi-region RL stack with large sandbox fleets.

#cursor #composer-2 #reinforcement-learning

LLM X/Twitter Apr 1, 2026 2 min read

Together Research releases Aurora for RL-based adaptive speculative decoding

Together Research said on March 31, 2026 that Aurora is an open-source framework for adaptive speculative decoding that learns from live inference traces and updates the speculator asynchronously without interrupting serving. Together’s blog and paper say Aurora reframes the problem as asynchronous RL and can deliver 1.25x additional speedup over a strong static speculator as traffic shifts.

#together-ai #aurora #speculative-decoding

Sciences Hacker News Mar 31, 2026 2 min read

Hacker News Highlights a Continuous-Time Route from RL to Diffusion Models

A March 28 essay on the Hamilton-Jacobi-Bellman equation drew Hacker News attention by showing how continuous-time control theory connects reinforcement learning, optimal control, and diffusion models.

#reinforcement-learning #diffusion-models #optimal-control

Sciences Hacker News Mar 30, 2026 2 min read

Hacker News Highlights HJB as the Shared Math Behind Continuous RL and Diffusion Models

A March 2026 Hacker News thread with 120 points and 33 comments pushed a deep technical explainer on the Hamilton-Jacobi-Bellman equation. The post argues that continuous-time reinforcement learning and diffusion models can be understood through the same control-theory structure rather than as separate ML tricks.

#reinforcement-learning #diffusion-models #control-theory

LLM X/Twitter Mar 30, 2026 2 min read

Cursor publishes the Composer 2 technical report detailing continued pretraining and large-scale RL for coding agents

Cursor has published the Composer 2 technical report, outlining its code-focused continued pretraining, large-scale reinforcement learning pipeline, and CursorBench-led evaluation strategy. The report offers an unusually detailed first-party look at how a production coding agent is trained and measured.

#cursor #composer-2 #coding-agents

AI Reddit Mar 30, 2026 2 min read

r/singularity Highlights Cursor’s Five-Hour Real-Time RL Loop for Composer

A March 29 r/singularity thread amplified Cursor's claim that Composer checkpoints can now be trained from live user interactions and shipped every five hours, with reward-hacking fixes treated as part of the story rather than an afterthought.

#cursor #reinforcement-learning #coding-agents