Skip to content

Microsoft Memora cuts agent memory context tokens by up to 98%

Original: Microsoft Research Memora cuts agent memory context by up to 98% View original →

Read in other languages: 한국어日本語
AI Jun 30, 2026 By Insights AI (Twitter) 1 min read 1 views Source
Microsoft Memora cuts agent memory context tokens by up to 98%

Long-running agents run into a memory problem before they run out of ambition. Reloading old conversations or stuffing retrieved context into every step becomes less efficient as tasks grow. In a June 29 X post, Microsoft Research introduced Memora as a scalable memory system built around that bottleneck.

"AI agents can't remember past conversations. They must constantly reload or retrieve context, which grows less efficient as tasks get longer and more complex. Memora solves this with a scalable memory system separating what’s stored from how it's retrieved."

Microsoft Research’s blog and publication summary describe Memora as a harmonic memory representation. The design separates rich memory content from lightweight abstractions and cue anchors used for retrieval. That split matters because abstraction helps memory scale, but too much abstraction hides the fine details agents need for reasoning. Full-context inference preserves detail but burns tokens. Memora tries to connect both layers so an agent can retrieve specific memories through compact cues.

The concrete claim is notable: Microsoft says Memora sets new state-of-the-art results on LoCoMo and LongMemEval, outperforming Mem0, RAG, and full-context inference while using up to 98% fewer context tokens. For customer support, software work, operations, and personal assistants, memory infrastructure can become as important as the base model because the task history is part of the job.

The next thing to watch is reproducibility and implementation. Developers will need to see the benchmark settings behind the 98% token reduction, how Memora plugs into existing vector search or RAG stacks, and how it handles memory correction, expiration, and deletion in real systems.

Share: Long

Related Articles