Study: LLMs Silently Corrupt 25% of Documents in Delegated Workflows

Original: LLMs Corrupt Your Documents When You Delegate View original →

Read in other languages: 한국어日本語
LLM May 10, 2026 By Insights AI (HN) 1 min read Source

Overview

A paper from Microsoft Research, LLMs Corrupt Your Documents When You Delegate, reveals a fundamental flaw in delegating long-form work to AI assistants. When users hand off complex document editing tasks to LLMs, the models silently introduce errors that accumulate over time.

The DELEGATE-52 Benchmark

Researchers created DELEGATE-52 to simulate extended delegated workflows across 52 professional domains including coding, crystallography, and music notation. Testing 19 LLMs revealed sobering results:

  • Even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows.
  • Lower-tier models fail far more severely.
  • Agentic tool use does not improve performance on DELEGATE-52.
  • Degradation worsens with document size, interaction length, and the presence of distractor files.

Why This Matters

The danger is in the silence. LLMs confidently edit documents while introducing sparse but severe errors that compound over a long session. As the AI industry pushes toward agentic paradigms, this study suggests current LLMs are not ready to be trusted delegates. The authors propose DELEGATE-52 as a public benchmark to track AI readiness for delegated work.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment