<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/f67000afe9184892aa08a3bbcb194892&quot; frameborder=&quot;0&quot; width=&quot;1920&quot; height=&quot;1440&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1440</height><width>1920</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1440</thumbnail_height><thumbnail_width>1920</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/f67000afe9184892aa08a3bbcb194892-53cae458864e1af6.gif</thumbnail_url><duration>221.555</duration><title>Benchmarking Long-Term Memory Systems for Agents</title><description>Hey everyone, I’m excited to share my work on Memory Bench, a universal benchmarking tool for long-term memory systems in agents. The current landscape is fragmented with numerous benchmarks and providers, making it difficult to answer fundamental questions about memory back-ends. Our approach focuses on three universal operations: adding, retrieving, and deleting memory, while acknowledging the diverse semantics across providers. I encourage you to take a look at our research snapshot, as it highlights the semantic gaps that impact fair comparisons and why we report more than just accuracy. Cheers!</description></oembed>