Google DeepMind publishes a harmful-manipulation eval toolkit after nine multi-country studies
Original: Protecting people from harmful manipulation View original →
What happened
Google DeepMind said on March 26, 2026 that it is releasing new research and a public eval toolkit for harmful manipulation in human-AI interactions. The company frames the risk as the difference between beneficial persuasion and deceptive pressure that pushes users toward harmful choices. Instead of treating the issue as a vague long-term concern, Google DeepMind says it now has a concrete methodology for measuring whether a model can alter human beliefs or behaviour in negative ways under controlled conditions.
The announcement matters because frontier-model safety work is usually discussed in terms of banned outputs, cyber misuse, or biological risks. Harmful manipulation is harder to measure because it can be subtle, domain-specific, and spread across long conversations rather than a single answer. Google DeepMind says its latest work is the first empirically validated toolkit for this category, and it is publishing the study materials so outside teams can run comparable human-participant experiments instead of relying only on internal claims.
Key details
According to the company, the research program covered nine studies with more than 10,000 participants across the UK, the US, and India. The tests focused on high-stakes domains including finance and health, where the team simulated scenarios such as investment choices and dietary-supplement recommendations. One of the more useful findings is that performance in one domain did not reliably predict performance in another, which argues against the idea that one broad safety score is enough to characterize manipulation risk.
Google DeepMind also says the framework now feeds into its own model-safety work, including evaluations for Gemini 3 Pro. That makes the release more than a research note. It is effectively a signal that harmful-manipulation testing is moving closer to the standard battery of frontier-model checks, alongside existing measures for other severe harms.
Why it matters next
The company is careful to note that these results come from controlled lab settings and should not be treated as a direct map of real-world misuse. Even so, publishing the toolkit raises the bar for the wider industry. Regulators, academic labs, and competing model providers now have a clearer starting point for comparing results, challenging claims, and expanding the benchmark to audio, video, image, and agentic settings where persuasion risk could become harder to see.
Related Articles
Google DeepMind said on March 26, 2026 that it is releasing research on how conversational AI might exploit emotions or manipulate people into harmful choices. The company says it built the first empirically validated toolkit to measure harmful AI manipulation, based on nine studies with more than 10,000 participants across the UK, the US, and India.
On March 25, 2026, OpenAI launched a public Safety Bug Bounty focused on AI abuse and safety risks. The new track complements its security program by accepting AI-specific failures such as prompt injection, data exfiltration, and harmful agent behavior.
Google said on March 25, 2026 that it is now targeting 2029 for post-quantum cryptography migration. The company argues recent progress in quantum hardware, error correction, and factoring estimates makes authentication and signature upgrades more urgent.
Comments (0)
No comments yet. Be the first to comment!