LLM X/Twitter 4h ago 2 min read
Why it matters: auditing fine-tuned models is still mostly guesswork once hidden behaviors are implanted. Anthropic says one shared LoRA adapter can make models verbalize what they learned, and on Qwen3-family tests verbalization rose from 37.7% at 0.6B to 77.3% at 14B.