Skip to content

Google’s unlearning audit catches privacy failures with thousands of samples

Original: New framework for auditing machine unlearning View original →

Read in other languages: 한국어日本語
AI Jun 11, 2026 By Insights AI 2 min read 1 views Source

“The model forgot it” is becoming a claim that needs evidence. On June 10, 2026, Google Research introduced a framework for auditing machine unlearning and differential privacy when auditors can only query a model and inspect output samples, not inspect the training run from the inside.

The work, described in a Google Research post, focuses on Regularized f-Divergence Kernel Tests, presented at AISTATS 2026. Machine unlearning aims to remove specific training data without retraining a model from scratch, a capability tied to privacy law, safety, and model quality. The hard part is proving that removal happened.

Traditional two-sample tests compare output distributions and ask whether they differ. That can be useful, but Google argues it becomes weak and expensive at model scale. Subtle or localized failures can be missed, while harmless distribution changes can be flagged as unsafe. Auditors may need very large sample counts to separate real privacy leakage from random noise.

Google’s framework uses f-divergences, including chi-squared, KL, and hockey-stick divergence, with kernel regularization to make the tests tractable. The adaptive approach also reduces manual hyperparameter tuning and avoids sample splitting. For privacy auditing, Google says its hockey-stick based tester detected violations in a sparse vector technique mechanism known as SVT3 using only a few thousand samples, while previously studied DP-Auditorium techniques required millions of samples to approximate the same detection rate.

The unlearning result is just as pointed. Instead of asking whether an unlearned model exactly matches a retrained “safe” model, Google proposes a three-sample relative test: is the unlearned model closer to the safe retrained model or to the compromised model that memorized sensitive data? In simplified evaluations, only the random label technique passed. Finetuning, pruning, and Selective Synaptic Dampening were found ineffective at truly forgetting the target data.

This is research, not a production certification scheme. Still, it raises the bar for AI systems trained on sensitive data. Privacy promises will increasingly need audit methods that are statistically grounded, sample-efficient, and usable without privileged access to the full training pipeline.

Share: Long

Related Articles