Anthropic details large-scale distillation attacks against Claude
Original: Detecting and preventing distillation attacks View original →
Anthropic said on February 23, 2026 that it detected industrial-scale efforts to extract Claude's capabilities through distillation attacks. In the post, the company named DeepSeek, Moonshot, and MiniMax, and said the campaigns generated more than 16 million exchanges with Claude through roughly 24,000 fraudulent accounts in violation of Anthropic's terms of service and regional access rules.
The company drew a sharp distinction between ordinary distillation and the behavior it says it observed. Distillation itself is a standard technique for training smaller or cheaper models from stronger ones, including within the same lab. Anthropic's allegation is that competitors used fraudulent access and repeated high-volume prompting to transfer Claude's capabilities into their own systems instead of developing them independently.
Anthropic said the campaigns relied on proxy services and what it called hydra cluster architectures: large networks of accounts that spread traffic across Anthropic's API and third-party cloud platforms. One proxy network, according to the company, managed more than 20,000 fraudulent accounts at the same time. Anthropic also said one targeted campaign pivoted within 24 hours of a new model release, suggesting that the operators were closely tracking changes in Claude's capabilities.
The security argument goes beyond commercial competition. Anthropic said illicit distillation can strip away safety behavior and reduce the visibility other labs have into how powerful model capabilities spread, especially in areas such as cyber misuse or bioweapon-related knowledge. The company also argued that these campaigns complicate debates around export controls because apparent capability gains may partly reflect extraction from existing American frontier models rather than entirely independent research progress.
To respond, Anthropic said it has built classifiers and behavioral fingerprinting systems to detect distillation patterns, including chain-of-thought elicitation, and that it is sharing technical indicators with other AI labs, cloud providers, and relevant authorities. Because the post is Anthropic's own account, its claims should be understood as company allegations rather than an independent adjudication. Even so, the disclosure is one of the clearest public looks yet at how model extraction has become a frontline security issue for frontier AI providers.
Related Articles
Claude Code Security, announced February 20, uses AI reasoning to scan codebases for vulnerabilities and found 500+ undetected bugs in production open-source code. Cybersecurity stocks fell sharply on the news.
Mozilla said on March 6, 2026 that Anthropic’s AI-assisted red team surfaced more than a dozen verifiable Firefox security bugs. Mozilla says engineers validated and fixed most of the issues before Firefox 148 shipped.
Anthropic is putting an initial $100 million behind the Claude Partner Network in 2026 to help consultancies, integrators, and AI services firms move enterprise Claude deployments into production. The program combines funding, certification, technical support, and a new code modernization starter kit.
Comments (0)
No comments yet. Be the first to comment!