HY-World 2.0、探索可能な3D world modelのcodeとweightsを公開

HY-World 2.0は、world modelがvideo clip生成を越えて、探索できる3D空間へ向かっていることを示す新しい研究だ。Team HY-Worldは、2026年4月15日17:59:17 UTCに提出されたarXiv paperで、text prompt、single-view image、multi-view image、videoから3D worldをreconstruct、generate、simulateするmultimodal frameworkを説明している。

出力は単なる2D生成ではない。HY-World 2.0は3D world representationを作り、textまたはsingle imageからhigh-fidelityでnavigableな3D Gaussian Splatting sceneを合成するとしている。Pipelineは四つの段階に分かれる。HY-Pano 2.0によるPanorama Generation、WorldNavによるTrajectory Planning、WorldStereo 2.0によるWorld Expansion、WorldMirror 2.0によるWorld Compositionだ。

論文はWorldLensというrendering platformも示す。著者らはengine-agnostic architecture、automatic IBL lighting、efficient collision detection、training-rendering co-design、character exploration supportを特徴として挙げる。World modelが実用に近づくには、生成された場面を眺めるだけでなく、user、simulator、embodied agentがその中を移動できる必要がある。

今回のreleaseで大きいのはopenである点だ。著者らはmodel weights、code、technical detailsを公開すると述べ、複数benchmarkでopen-source approachesの中で最も強い結果を示し、closed-source model Marbleに近い結果だと報告している。もちろん、この主張は外部検証が必要だ。特にpaperの整ったexampleを離れた場面、特殊なgeometry、downstream simulation taskでどこまで崩れないかが問われる。

Developerにとって、codeとweightsは評価の会話を変える。Curated videoから品質を推測するだけでなく、camera path、lighting assumption、memory consistency、collision behaviorを直接試せるからだ。印象的なmedia modelと、stress testできるtoolの違いはここにある。

近い用途はgenerative mediaだけではない。探索可能な3D world modelは、game prototyping、synthetic data、robotics simulation、spatial reasoning research、interactive scene editingにまたがる。次に見るべき点は、予想外のcamera pathやuser interactionでもgeometry、physics cue、object consistencyを保てるかどうかだ。

HY-World 2.0、探索可能な3D world modelのcodeとweightsを公開

Related Articles

Rocket Leagueで学習したMIRA、multiplayer world modelの実験材料に

FLUX 3、画像生成から動画・音声・行動予測まで広げる構想

Google DeepMind、インタラクティブ世界モデル『Genie 3』を発表

Related Articles

Rocket Leagueで学習したMIRA、multiplayer world modelの実験材料に
AI Reddit Jul 8, 2026 1 min read

FLUX 3、画像生成から動画・音声・行動予測まで広げる構想

Google DeepMind、インタラクティブ世界モデル『Genie 3』を発表
AI Feb 20, 2026 1 min read