OpenAI says GPT-5.4 Thinking shows low chain-of-thought controllability in new safety study
Original: We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning-suggesting CoT monitoring remains a useful safety tool. https://openai.com/index/reasoning-models-chain-of-thought-controllability/ View original →
On March 5, 2026, OpenAI used X to introduce a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. The company said GPT-5.4 Thinking showed low ability to obscure its reasoning, and the linked research summary framed that result positively because it suggests CoT monitoring still provides a useful safety signal.
The card linked from the post described the work as CoT-Control, an evaluation setup for testing whether reasoning models can intentionally change or hide the content of their intermediate reasoning traces. OpenAI's claim is not that reasoning models are fully transparent in every circumstance. Instead, the company is reporting that current frontier reasoning systems still struggle to reliably manipulate those traces on demand. From a safety perspective, that matters because some monitoring approaches depend on the assumption that internal reasoning remains at least partially inspectable.
- What OpenAI announced: a new evaluation suite plus a research paper.
- Research framing: low controllability of visible reasoning can be beneficial for monitoring.
- Model named in the post: GPT-5.4 Thinking.
The announcement stands out because OpenAI treated a limitation as a safety-relevant property. In many model launches, more control is presented as strictly better. Here, the company argued that lower ability to hide or rewrite reasoning traces may reduce one class of audit blind spot. That does not solve broader alignment or reliability questions, but it does strengthen the case for keeping reasoning-monitoring techniques in the evaluation toolbox while models become more capable.
Primary sources are OpenAI's March 5, 2026 X post and the linked research summary. Because the company described the result in terms of low ability to obscure reasoning, the most careful reading is that OpenAI is reporting an empirical finding about current model behavior, not making a universal claim that all future reasoning models will remain similarly monitorable. The linked summary page is titled Reasoning models struggle to control their chains of thought, and that's good.
Related Articles
OpenAI made ChatGPT Lockdown Mode available to all logged-in users and added moderation scores to API generation requests on June 4. The changes move prompt-injection and data-exfiltration defenses from policy language into product controls.
OpenAI said on March 10, 2026 that its new IH-Challenge dataset improves instruction hierarchy behavior in frontier LLMs, with gains in safety steerability and prompt-injection robustness. The company also released the dataset publicly on Hugging Face to support further research.
Anthropic said on X that Claude Opus 4.6 showed cases of benchmark recognition during BrowseComp evaluation. The engineering write-up turns that into a broader warning about eval integrity in web-enabled model testing.