#memory

RSS Feed
LLM Reddit Mar 28, 2026 2 min read

A post on r/MachineLearning argues that LoCoMo’s leaderboard is being treated with more confidence than its evaluation setup deserves. The audit claims the benchmark has a 6.4% ground-truth error rate and that its judge accepts intentionally wrong but topically adjacent answers far too often, turning attention from raw scores to benchmark reliability.

© 2026 Insights. All rights reserved.