Backtesting Prompts for Improved Outputs 🔍

video1.0<iframe src="https://www.loom.com/embed/6f604ce9c4604c35822debdf8b326ec3" frameborder="0" width="1416" height="1062" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>10621416Loomhttps://www.loom.com10621416https://cdn.loom.com/sessions/thumbnails/6f604ce9c4604c35822debdf8b326ec3-532369ac46e47c06.gif315.744Backtesting Prompts for Improved Outputs 🔍In this video, I walk you through a more complex evaluation process using a new dataset populated with production data from our prompt. We created a backtest evaluation called "backtest chef" to analyze how changes in our prompt affect outputs, specifically focusing on the inclusion of ingredients in a bulleted list. I demonstrate how to compare the old and new responses using a diff column to highlight the differences. I encourage you to explore these changes and consider how this method can help us bootstrap datasets effectively. Please take a moment to review the outputs and think about how we can apply this in our future work.