OpenAI Images 2.0 system card、deepfakeリスクを数値化

OpenAIのChatGPT Images 2.0 System Card、2026年4月21日公開は、性能向上の説明だけではない。Reasoning、tool use、live web search data、dense textを含む複雑なscene generationで、safetyがどのように測られ、制御されるのかを示している。

Capabilityの変化はわかりやすい。Images 2.0は、world knowledge、instruction following、detail generationを強めている。だが同じ強みがriskも大きくする。OpenAIは、heightened realismにより、safeguardがなければreal people、political event、sexual content、sensitive placeに関するdeepfakeがより説得力を持ち得ると説明している。つまりdeploymentの課題は、単純なprompt filteringではなく、image-specificなlayered controlになる。

Safety stackはいくつかのgateで構成される。Text classifierは、requestがimage modelへ送られる前にpolicy違反の可能性があるpromptを拒否する。Safety-focused multimodal reasoning modelは、generation前にtextとimage inputを確認し、生成されたoutputがuserに表示される前にも確認する。OpenAIは、評価もraw taxonomy matchingから、実際のproduct環境に近いharmful-output risk測定へ移したとしている。

最も重要なのは数字だ。Adversarial testingでは、final thinking mode checkpointがfull production stackの前段階で6,944回中464枚のpolicy-violating imageを生成した。比率は6.7%だ。Instant modeでは3,112回中685枚、22.0%だった。Instant modeの違反image 685枚のうち、downstream monitorは598枚を検出した。Promptとimage stackを合わせると658枚を検出し、combined recallは96.1%、adversarial promptの99.1%がsafe outputになった。Thinking modeはcombined stack後に99.2%のsafe outputへ達している。

Bioriskの扱いも重い。OpenAIは、一部のimage outputについて、dangerous substanceに関するharmful taskでnoviceを助け得るほど正確だとbioweapons expertが判断した例があったとしている。そのため、このmodelをbiology mitigation上はhigh capabilityとして扱い、image-specific biological risk policyをinputとoutputの両方に適用している。

Userにとっての要点は、image model safetyがrealism、instruction following、external information、provenanceを同時に扱う段階に入ったことだ。OpenAIはImages 2.0でC2PA metadataを継続し、imperceptibleでcontent-specificなwatermarkとinternal detection toolingを加えたとしている。次に見るべき点は、lab evaluationの外だ。Image editing、web-grounded prompt、multi-image workflowが混ざる実利用でも、これらのcontrolが同じ水準で機能するかが問われる。

OpenAI Images 2.0 system card、deepfakeリスクを数値化

Related Articles

ChatGPT Images 2.0でHNが見た焦点は、galleryより難問prompt

OpenAI、検証済みcyber defenseに$10M API creditsを投入

OpenAI、C2PA・consent control・teen protection を含む Sora safety stack を公開

Comments (0)

Leave a Comment

Related Articles

ChatGPT Images 2.0でHNが見た焦点は、galleryより難問prompt

OpenAI、検証済みcyber defenseに$10M API creditsを投入

OpenAI、C2PA・consent control・teen protection を含む Sora safety stack を公開
AI Mar 28, 2026 1 min read