Microsoft Research Introduces CORPGEN for Multi-Task Enterprise Agents

Microsoft Research published "CORPGEN advances AI agents for real work" on February 26, 2026, arguing that common single-task benchmarks understate the real challenge of enterprise automation: handling many interdependent tasks at once for hours.

To model this, the team built Multi-Horizon Task Environments (MHTEs). In these scenarios, an agent must execute multiple overlapping assignments, with each task requiring roughly 10 to 30 dependent steps and continuous reprioritization. Microsoft reports testing loads up to 46 simultaneous tasks in sessions lasting about six hours.

The baseline result is sobering. Across three independent agent backends, completion rates dropped from 16.7% to 8.7% as concurrency increased. CORPGEN is presented as a system-level response to that failure mode. Its design combines hierarchical planning, isolated subagents to reduce cross-task contamination, tiered memory for selective recall, and adaptive summarization to control context growth.

Microsoft frames CORPGEN agents as "digital employees" with persistent identities, role structure, and realistic schedules, operating productivity software through GUI automation. Collaboration is modeled through channels such as email and Microsoft Teams without shared internal state. The post emphasizes that this setup allows emergent coordination patterns while keeping each agent modular.

In Microsoft’s reported evaluation, CORPGEN reached 15.2% completion at 46 tasks versus 4.3% for baselines, roughly a 3.5x improvement. The company also highlights experiential learning as the largest single gain source, citing an increase from 8.7% to 15.2% when agents reused successful prior trajectories. Another notable finding is methodological: judging output artifacts aligned with human assessments around 90%, while screenshot-and-log-only evaluation aligned around 40%, implying that many current agent benchmarks may miss practical task completion.

The broader takeaway is that enterprise agent progress may depend less on one stronger base model and more on orchestration, memory design, and evaluation realism. CORPGEN pushes that system-engineering perspective into a measurable benchmark format.

Microsoft Research Introduces CORPGEN for Multi-Task Enterprise Agents

Related Articles

Claude Fable 5 reaches 1932 on GDPval-AA and takes agent benchmark lead

Opus 4.8 beats GPT-5.5 by 121 points on GDPval-AA agent benchmark

Nemotron 3 Ultra uses 550B MoE design to cut agent costs by 30%