Hacker News Highlights a Continuous-Time Route from RL to Diffusion Models
Original: Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models View original →
A Hacker News discussion on March 30, 2026 boosted visibility for Daniel López Montero’s March 28 essay on the Hamilton-Jacobi-Bellman equation, a mathematical object that sits behind optimal control and, by extension, a large part of reinforcement learning. The post argues that continuous-time control is not just historical background; it provides a useful lens for understanding how modern AI systems are trained and optimized. That framing stood out in a feed that often concentrates on products and launches rather than on mathematical structure.
The essay starts from Bellman’s discrete-time dynamic programming and then shows what changes when the time step shrinks toward zero. In that limit, the Bellman equation becomes the HJB partial differential equation. From there, the author moves into controlled diffusions, Itô processes, and the infinitesimal generator that governs state evolution under noise. For readers who mostly encounter reinforcement learning through Markov decision processes and policy gradients, the piece offers a more structural explanation of why these methods exist in the first place.
The most interesting bridge is the one to diffusion models. Rather than treating generative diffusion as a separate toolkit, the article frames it as another problem in stochastic optimal control. That perspective connects sampling, denoising, and control-theoretic objectives, and it helps explain why tools from PDEs, policy iteration, and Monte Carlo evaluation continue to reappear in generative modeling research. The post also includes concrete examples such as stochastic LQR and the Merton portfolio problem, which ground the theory in recognizable control settings.
Why did this resonate on Hacker News? Because it pushes back against the idea that current AI progress is only about bigger models and more compute. The essay makes a case that old mathematics still structures new systems, and that understanding those foundations can improve how researchers reason about both reinforcement learning and generative models. For engineers, it is a useful reminder that the gap between theory and practice is often smaller than the tooling stack makes it look.
- Original source: Daniel López Montero’s March 28, 2026 essay
- Core theme: HJB links optimal control, continuous-time RL, and diffusion models
- Main takeaway: classical mathematics still explains much of modern AI behavior
Related Articles
2026年3月のHacker Newsで120 pointsと33 commentsを集めた記事は、Hamilton-Jacobi-Bellman equationの技術解説を前面に押し出した。continuous-time reinforcement learningとdiffusion modelを別々のML手法ではなく、同じcontrol theoryの構造として理解できるという主張だ。
Google DeepMindがGeminiベースのマルチエージェントシステム「AI Co-Mathematician」を公開。FrontierMath Tier 4でAI史上最高の48%を達成し、AlphaEvolveは11〜20年間更新されていなかったラムゼー数5問の下限値を改善した。
OpenAIの汎用推論モデルが、1946年にエルデシュが提起した離散幾何学の核心的予想を自律的に反証した。AIが著名な未解決数学問題を自力で解いたのは史上初であり、プリンストン大学のノガ・アロンを含む複数の数学者が証明を検証した。