#delegation - Insights

LLM Hacker News May 10, 2026 1 min read

Study: LLMs Silently Corrupt 25% of Documents in Delegated Workflows

A new DELEGATE-52 benchmark study finds that even frontier LLMs like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt an average of 25% of document content during long delegated workflows, with errors compounding silently.

#llm #research #ai-safety