<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/6f604ce9c4604c35822debdf8b326ec3&quot; frameborder=&quot;0&quot; width=&quot;1416&quot; height=&quot;1062&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1062</height><width>1416</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1062</thumbnail_height><thumbnail_width>1416</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/6f604ce9c4604c35822debdf8b326ec3-532369ac46e47c06.gif</thumbnail_url><duration>315.744</duration><title>Backtesting Prompts for Improved Outputs 🔍</title><description>In this video, I walk you through a more complex evaluation process using a new dataset populated with production data from our prompt. We created a backtest evaluation called &quot;backtest chef&quot; to analyze how changes in our prompt affect outputs, specifically focusing on the inclusion of ingredients in a bulleted list. I demonstrate how to compare the old and new responses using a diff column to highlight the differences. I encourage you to explore these changes and consider how this method can help us bootstrap datasets effectively. Please take a moment to review the outputs and think about how we can apply this in our future work.</description></oembed>