OpenAI says GPT-5.4 Thinking shows low chain-of-thought controllability in new safety study
Original: We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning-suggesting CoT monitoring remains a useful safety tool. https://openai.com/index/reasoning-models-chain-of-thought-controllability/ View original →
On March 5, 2026, OpenAI used X to introduce a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. The company said GPT-5.4 Thinking showed low ability to obscure its reasoning, and the linked research summary framed that result positively because it suggests CoT monitoring still provides a useful safety signal.
The card linked from the post described the work as CoT-Control, an evaluation setup for testing whether reasoning models can intentionally change or hide the content of their intermediate reasoning traces. OpenAI's claim is not that reasoning models are fully transparent in every circumstance. Instead, the company is reporting that current frontier reasoning systems still struggle to reliably manipulate those traces on demand. From a safety perspective, that matters because some monitoring approaches depend on the assumption that internal reasoning remains at least partially inspectable.
- What OpenAI announced: a new evaluation suite plus a research paper.
- Research framing: low controllability of visible reasoning can be beneficial for monitoring.
- Model named in the post: GPT-5.4 Thinking.
The announcement stands out because OpenAI treated a limitation as a safety-relevant property. In many model launches, more control is presented as strictly better. Here, the company argued that lower ability to hide or rewrite reasoning traces may reduce one class of audit blind spot. That does not solve broader alignment or reliability questions, but it does strengthen the case for keeping reasoning-monitoring techniques in the evaluation toolbox while models become more capable.
Primary sources are OpenAI's March 5, 2026 X post and the linked research summary. Because the company described the result in terms of low ability to obscure reasoning, the most careful reading is that OpenAI is reporting an empirical finding about current model behavior, not making a universal claim that all future reasoning models will remain similarly monitorable. The linked summary page is titled Reasoning models struggle to control their chains of thought, and that's good.
Related Articles
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.
Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
The r/MachineLearning thread captured a practical benchmark problem: closed models dominate eval tables even when their results are not reproducible in the old Papers with Code sense.