This matters because the fight over model copying is no longer staying inside lobbying letters and company blog posts. Reuters reported on April 26 that the U.S. State Department told diplomats worldwide to warn foreign governments about AI models allegedly distilled from U.S. systems, naming DeepSeek and also mentioning Moonshot AI and MiniMax.
#distillation
RSS FeedWashington is no longer treating model distillation as a lab-level abuse problem. The White House says foreign actors, chiefly China, are using tens of thousands of proxies and jailbreaking techniques to copy US frontier AI systems and ship cheaper models that can look comparable on select benchmarks.
Synthetic-data training has a sharper safety problem than obvious bad examples. A Nature paper co-authored by Anthropic researchers reports that traits such as owl preference or misalignment can move through semantically unrelated number sequences.
Lightning OPD attacks a practical bottleneck in on-policy distillation: keeping a live teacher model running throughout training. The paper reports 69.9% on AIME 2024 from Qwen3-8B-Base in 30 GPU hours, a 4.0x speedup over standard OPD.
Anthropic said it detected industrial-scale campaigns by DeepSeek, Moonshot, and MiniMax to extract Claude outputs at scale. The company said the activity involved more than 16 million exchanges through about 24,000 fraudulent accounts and that it is investing in detection and response tooling.
A March 19, 2026 Hacker News post about NanoGPT Slowrun reached 162 points and 43 comments at crawl time. Q Labs says an ensemble of 1.8B-parameter models trained on 100M tokens matched a baseline that would normally require 1B tokens.
Q Labs says 100M tokens and an 18B-parameter ensemble can match a 1B-token baseline, and Hacker News immediately focused on whether that gain survives serving and deployment.
A popular r/LocalLLaMA post highlighted a community merge of uncensored and reasoning-distilled Qwen 3.5 9B checkpoints, underscoring the appetite for behavior-tuned small local models.
Anthropic says distillation attacks against Claude are increasing and calls for coordinated industry and policy action. In an accompanying post, the company reports campaign-level abuse patterns and outlines technical and operational countermeasures.
Anthropic has accused three Chinese AI companies — DeepSeek, Moonshot AI (Kimi), and MiniMax — of creating over 24,000 fraudulent Claude accounts to extract training data from 16 million conversations, marking a major escalation in AI intellectual property disputes.
Anthropic has accused Chinese AI firms of creating over 24,000 fraudulent accounts to extract 16 million training exchanges from Claude for model distillation.
A Reddit thread amplified an Ars Technica report that Google detected a 100,000+ prompt extraction campaign against Gemini, reopening questions about distillation, defense, and IP boundaries.