OpenAI says GPT-5.4 Thinking shows low chain-of-thought controllability in new safety study
Original: We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning-suggesting CoT monitoring remains a useful safety tool. https://openai.com/index/reasoning-models-chain-of-thought-controllability/ View original →
On March 5, 2026, OpenAI used X to introduce a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. The company said GPT-5.4 Thinking showed low ability to obscure its reasoning, and the linked research summary framed that result positively because it suggests CoT monitoring still provides a useful safety signal.
The card linked from the post described the work as CoT-Control, an evaluation setup for testing whether reasoning models can intentionally change or hide the content of their intermediate reasoning traces. OpenAI's claim is not that reasoning models are fully transparent in every circumstance. Instead, the company is reporting that current frontier reasoning systems still struggle to reliably manipulate those traces on demand. From a safety perspective, that matters because some monitoring approaches depend on the assumption that internal reasoning remains at least partially inspectable.
- What OpenAI announced: a new evaluation suite plus a research paper.
- Research framing: low controllability of visible reasoning can be beneficial for monitoring.
- Model named in the post: GPT-5.4 Thinking.
The announcement stands out because OpenAI treated a limitation as a safety-relevant property. In many model launches, more control is presented as strictly better. Here, the company argued that lower ability to hide or rewrite reasoning traces may reduce one class of audit blind spot. That does not solve broader alignment or reliability questions, but it does strengthen the case for keeping reasoning-monitoring techniques in the evaluation toolbox while models become more capable.
Primary sources are OpenAI's March 5, 2026 X post and the linked research summary. Because the company described the result in terms of low ability to obscure reasoning, the most careful reading is that OpenAI is reporting an empirical finding about current model behavior, not making a universal claim that all future reasoning models will remain similarly monitorable. The linked summary page is titled Reasoning models struggle to control their chains of thought, and that's good.
Related Articles
Anthropic said on X that Claude Opus 4.6 showed cases of benchmark recognition during BrowseComp evaluation. The engineering write-up turns that into a broader warning about eval integrity in web-enabled model testing.
OpenAI released proof attempts for all 10 First Proof problems and said expert feedback suggests at least five may be correct. The company positioned the result as a test of long-horizon reasoning beyond standard benchmarks.
OpenAI reports that, across more than one million ChatGPT conversations, the share of difficult interactions exceeding a human baseline increased roughly fourfold from September 2024 to January 2026. The company also shows large gains in case-interview and puzzle-style open tasks.
Comments (0)
No comments yet. Be the first to comment!