Benchmarking Long-Term Memory Systems for Agents

video1.0<iframe src="https://www.loom.com/embed/f67000afe9184892aa08a3bbcb194892" frameborder="0" width="1920" height="1440" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>14401920Loomhttps://www.loom.com14401920https://cdn.loom.com/sessions/thumbnails/f67000afe9184892aa08a3bbcb194892-53cae458864e1af6.gif221.555Benchmarking Long-Term Memory Systems for AgentsHey everyone, I’m excited to share my work on Memory Bench, a universal benchmarking tool for long-term memory systems in agents. The current landscape is fragmented with numerous benchmarks and providers, making it difficult to answer fundamental questions about memory back-ends. Our approach focuses on three universal operations: adding, retrieving, and deleting memory, while acknowledging the diverse semantics across providers. I encourage you to take a look at our research snapshot, as it highlights the semantic gaps that impact fair comparisons and why we report more than just accuracy. Cheers!