#distillation

AI 1d ago 2 min read

White House ties China to industrial-scale distillation attacks

Washington is no longer treating model distillation as a lab-level abuse problem. The White House says foreign actors, chiefly China, are using tens of thousands of proxies and jailbreaking techniques to copy US frontier AI systems and ship cheaper models that can look comparable on select benchmarks.

#white-house #china #distillation

LLM sources.twitter Apr 16, 2026 1 min read

Nature paper shows LLM traits can pass through hidden data signals

Synthetic-data training has a sharper safety problem than obvious bad examples. A Nature paper co-authored by Anthropic researchers reports that traits such as owl preference or misalignment can move through semantically unrelated number sequences.

#ai-safety #llm #distillation

LLM Apr 16, 2026 2 min read

Lightning OPD cuts reasoning-model post-training to 30 GPU hours

Lightning OPD attacks a practical bottleneck in on-policy distillation: keeping a live teacher model running throughout training. The paper reports 69.9% on AIME 2024 from Qwen3-8B-Base in 30 GPU hours, a 4.0x speedup over standard OPD.

#llm #distillation #post-training

AI Mar 21, 2026 2 min read

Anthropic details large-scale distillation attacks against Claude

Anthropic said it detected industrial-scale campaigns by DeepSeek, Moonshot, and MiniMax to extract Claude outputs at scale. The company said the activity involved more than 16 million exchanges through about 24,000 fraudulent accounts and that it is investing in detection and response tooling.

#anthropic #security #distillation

AI Hacker News Mar 20, 2026 2 min read

Hacker News Tracks NanoGPT Slowrun’s 10x Data-Efficiency Claim Under Fixed Data

A March 19, 2026 Hacker News post about NanoGPT Slowrun reached 162 points and 43 comments at crawl time. Q Labs says an ensemble of 1.8B-parameter models trained on 100M tokens matched a baseline that would normally require 1B tokens.

#language-models #data-efficiency #ensembles

LLM Hacker News Mar 20, 2026 2 min read

Hacker News Examines NanoGPT Slowrun's 10x Data-Efficiency Claim

Q Labs says 100M tokens and an 18B-parameter ensemble can match a 1B-token baseline, and Hacker News immediately focused on whether that gain survives serving and deployment.

#llm #training #scaling-laws

LLM Reddit Mar 20, 2026 2 min read

LocalLLaMA Boosts a Community Qwen 3.5 9B GGUF Merge for Low-Refusal Local Use

A popular r/LocalLLaMA post highlighted a community merge of uncensored and reasoning-distilled Qwen 3.5 9B checkpoints, underscoring the appetite for behavior-tuned small local models.

#qwen #gguf #distillation

AI sources.twitter Mar 4, 2026 1 min read

Anthropic Details Large-Scale Distillation Attack Campaigns

Anthropic says distillation attacks against Claude are increasing and calls for coordinated industry and policy action. In an accompanying post, the company reports campaign-level abuse patterns and outlines technical and operational countermeasures.

#anthropic #distillation #ai-security

LLM Feb 25, 2026 2 min read

Anthropic Discloses Industrial-Scale Distillation Attacks Involving 16M+ Queries

On February 23, 2026, Anthropic said it detected large-scale distillation abuse tied to roughly 24,000 fraudulent accounts and more than 16 million Claude exchanges. The company framed the issue as both a model security and policy challenge.

#anthropic #distillation #llm-security

AI Reddit Feb 24, 2026 1 min read

Anthropic Accuses DeepSeek and Chinese AI Firms of Stealing 16M Claude Training Exchanges

Anthropic has accused three Chinese AI companies — DeepSeek, Moonshot AI (Kimi), and MiniMax — of creating over 24,000 fraudulent Claude accounts to extract training data from 16 million conversations, marking a major escalation in AI intellectual property disputes.

#anthropic #deepseek #distillation

LLM Reddit Feb 24, 2026 1 min read

Anthropic Identifies Industrial-Scale Model Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax

Anthropic has accused Chinese AI firms of creating over 24,000 fraudulent accounts to extract 16 million training exchanges from Claude for model distillation.

#anthropic #deepseek #distillation

LLM Reddit Feb 16, 2026 1 min read

Gemini Extraction Attempt Renews Distillation Boundary Debate

A Reddit thread amplified an Ars Technica report that Google detected a 100,000+ prompt extraction campaign against Gemini, reopening questions about distillation, defense, and IP boundaries.

#gemini #model-extraction #distillation