<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/c548c766916f4ecebdfb997d55ab5fa0&quot; frameborder=&quot;0&quot; width=&quot;1108&quot; height=&quot;831&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>831</height><width>1108</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>831</thumbnail_height><thumbnail_width>1108</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/c548c766916f4ecebdfb997d55ab5fa0-a6a9b2300d59a6c0.gif</thumbnail_url><duration>637.378</duration><title>Agent Evals in Crafting</title><description>We demonstrate how to use Crafting for evaluating and iterating on a customer-facing agent. Since agent runs are non-deterministic, we set up multiple parallel agent runs in sandboxes to measure performance changes based on prompt improvements. After establishing a baseline, we open a PR with improvements to automatically trigger another round of evals inside of Crafting.

https://github.com/crafting-demo/agent-validation</description></oembed>