Microsoft Memora、agent memoryのcontext tokenを最大98%削減

長い作業を担うagentでは、model能力より先にmemoryが詰まりやすい。過去の会話を毎回読み直したり、検索結果を大量にcontextへ入れたりすると、taskが長くなるほど効率が落ちる。Microsoft Researchは2026年6月29日のX投稿で、この問題に向けたscalable memory systemとしてMemoraを紹介した。

"AI agents can't remember past conversations. They must constantly reload or retrieve context, which grows less efficient as tasks get longer and more complex. Memora solves this with a scalable memory system separating what’s stored from how it's retrieved."

Microsoft Researchのblogとpublication summaryによると、Memoraはharmonic memory representationを採る。保存されるrich memory contentと、検索に使う軽量なabstractionやcue anchorを分ける設計だ。abstractionはmemoryを拡張しやすくするが、細部を隠してしまう。一方、full-context inferenceは細部を保つがtoken消費が大きい。Memoraはこの二つの層をつなぎ、短いcueから必要な具体的記憶を取り戻す。

数値も大きい。Microsoft Researchは、MemoraがLoCoMoとLongMemEvalでMem0、RAG、full-context inferenceを上回り、context tokenを最大98%少なく使ったと説明している。customer support、software作業、operations automationのような長期taskでは、memory layerがbase modelと同じくらい重要になる。

次に見るべきは再現性と実装だ。98%削減のbenchmark条件、既存のvector searchやRAG stackとの接続、古いmemoryの修正・失効・削除をどう扱うかが、実運用での焦点になる。

Microsoft Memora、agent memoryのcontext tokenを最大98%削減

Related Articles

Codexの機密file除外論争、ignore fileだけでは足りない権限境界

Copilot code review、repo探索の変更でコスト20%削減

Perplexity Computer for Counsel、法律DBとmatter管理ツールを接続