OpenAI opens applications for a Safety Fellowship focused on alignment and misuse research
Original: OpenAI opens applications for a Safety Fellowship focused on alignment and misuse research View original →
In an April 6 post on X, OpenAI announced a new Safety Fellowship aimed at external researchers, engineers, and practitioners working on safety and alignment. The linked official post describes the effort as a pilot program for rigorous, high-impact research on advanced AI systems. This is not a product launch, but it is still strategically important: OpenAI is formalizing an external pathway for safety talent and research output instead of keeping that work purely internal.
The program details are unusually specific. OpenAI says the fellowship will run from September 14, 2026 through February 5, 2027, with applications closing on May 3 and decisions going out by July 25. Priority areas include safety evaluation, ethics, robustness, scalable mitigations, privacy-preserving safety methods, agentic oversight, and high-severity misuse domains. That list suggests the company is looking for work that connects directly to current and near-term deployment risks rather than abstract safety discussion alone.
OpenAI is turning external safety work into a structured program
OpenAI says fellows will work closely with mentors, join a cohort, and have access to workspace in Berkeley through Constellation while still being allowed to work remotely. The expected outputs are substantial research artifacts such as papers, benchmarks, or datasets. The company also says the program includes a monthly stipend, compute support, and ongoing mentorship, while clarifying that fellows will not receive internal system access and will instead work with API credits and related resources.
The significance is broader than one cohort. Frontier labs have been under pressure to show that safety investment is not limited to public principles or internal evaluation checklists. A structured external fellowship is one way to expand both the talent pipeline and the amount of practical research happening around model oversight and misuse prevention. If more labs adopt similar programs, this could become an important part of how AI safety capacity is built outside core company walls. Sources: the X announcement and OpenAI’s fellowship post.
Related Articles
Anthropic said on March 31, 2026 that it signed an MOU with the Australian government to collaborate on AI safety research and support Australia’s National AI Plan. Anthropic says the agreement includes work with Australia’s AI Safety Institute, Economic Index data sharing, and AUD$3 million in partnerships with Australian research institutions.
Google DeepMind said on March 26, 2026 that it is releasing research on how conversational AI might exploit emotions or manipulate people into harmful choices. The company says it built the first empirically validated toolkit to measure harmful AI manipulation, based on nine studies with more than 10,000 participants across the UK, the US, and India.
Google DeepMind says it has built a harmful manipulation evaluation toolkit from nine studies spanning more than 10,000 participants. The work argues that manipulation risk is domain-specific, with finance and health producing very different outcomes.
Comments (0)
No comments yet. Be the first to comment!