Anthropic details large-scale distillation attacks against Claude
Original: Detecting and preventing distillation attacks View original →
Anthropic said on February 23, 2026 that it detected industrial-scale efforts to extract Claude's capabilities through distillation attacks. In the post, the company named DeepSeek, Moonshot, and MiniMax, and said the campaigns generated more than 16 million exchanges with Claude through roughly 24,000 fraudulent accounts in violation of Anthropic's terms of service and regional access rules.
The company drew a sharp distinction between ordinary distillation and the behavior it says it observed. Distillation itself is a standard technique for training smaller or cheaper models from stronger ones, including within the same lab. Anthropic's allegation is that competitors used fraudulent access and repeated high-volume prompting to transfer Claude's capabilities into their own systems instead of developing them independently.
Anthropic said the campaigns relied on proxy services and what it called hydra cluster architectures: large networks of accounts that spread traffic across Anthropic's API and third-party cloud platforms. One proxy network, according to the company, managed more than 20,000 fraudulent accounts at the same time. Anthropic also said one targeted campaign pivoted within 24 hours of a new model release, suggesting that the operators were closely tracking changes in Claude's capabilities.
The security argument goes beyond commercial competition. Anthropic said illicit distillation can strip away safety behavior and reduce the visibility other labs have into how powerful model capabilities spread, especially in areas such as cyber misuse or bioweapon-related knowledge. The company also argued that these campaigns complicate debates around export controls because apparent capability gains may partly reflect extraction from existing American frontier models rather than entirely independent research progress.
To respond, Anthropic said it has built classifiers and behavioral fingerprinting systems to detect distillation patterns, including chain-of-thought elicitation, and that it is sharing technical indicators with other AI labs, cloud providers, and relevant authorities. Because the post is Anthropic's own account, its claims should be understood as company allegations rather than an independent adjudication. Even so, the disclosure is one of the clearest public looks yet at how model extraction has become a frontline security issue for frontier AI providers.
Related Articles
Why it matters: AI security tools only matter if teams trust the findings enough to act. Anthropic put Opus 4.7 behind a beta workflow that scans code, validates issues, and suggests fixes after a preview used by hundreds of organizations.
Anthropic launched the Claude Security public beta for Enterprise customers, offering Opus 4.7-powered codebase scanning that auto-generates targeted patch suggestions, exports findings to CSV or Markdown, and integrates with Slack and Jira.
Anthropic unveiled 10 Claude agent templates for financial services, covering pitchbook creation, KYC screening, month-end closing, and more—with Claude Opus 4.7 topping the Vals AI Finance Agent benchmark at 64.37%.
Comments (0)
No comments yet. Be the first to comment!