Microsoft Memora cuts agent memory context tokens by up to 98%
Original: Microsoft Research Memora cuts agent memory context by up to 98% View original →
Long-running agents run into a memory problem before they run out of ambition. Reloading old conversations or stuffing retrieved context into every step becomes less efficient as tasks grow. In a June 29 X post, Microsoft Research introduced Memora as a scalable memory system built around that bottleneck.
"AI agents can't remember past conversations. They must constantly reload or retrieve context, which grows less efficient as tasks get longer and more complex. Memora solves this with a scalable memory system separating what’s stored from how it's retrieved."
Microsoft Research’s blog and publication summary describe Memora as a harmonic memory representation. The design separates rich memory content from lightweight abstractions and cue anchors used for retrieval. That split matters because abstraction helps memory scale, but too much abstraction hides the fine details agents need for reasoning. Full-context inference preserves detail but burns tokens. Memora tries to connect both layers so an agent can retrieve specific memories through compact cues.
The concrete claim is notable: Microsoft says Memora sets new state-of-the-art results on LoCoMo and LongMemEval, outperforming Mem0, RAG, and full-context inference while using up to 98% fewer context tokens. For customer support, software work, operations, and personal assistants, memory infrastructure can become as important as the base model because the task history is part of the job.
The next thing to watch is reproducibility and implementation. Developers will need to see the benchmark settings behind the 98% token reduction, how Memora plugs into existing vector search or RAG stacks, and how it handles memory correction, expiration, and deletion in real systems.
Related Articles
Mistral is treating connectors as an enterprise control plane, not just an integration feature. The June 24 update makes workspace-level tool controls, scoped connector API keys, and multi-account connectors generally available, with an MCP connector debugger entering public preview.
GitHub says Copilot code review now uses CLI and SDK file-exploration tools such as grep, rg, glob, and view. The change has reduced review costs by about 20% while preserving review quality in offline and online evaluation.
Legal AI is shifting from answers toward tool-connected work. Perplexity says Computer for Counsel connects Midpage, LegalZoom, DocuSign, NetDocuments, and other systems for Pro and Max subscribers.