{"type":"video","version":"1.0","html":"<iframe src=\"https://www.loom.com/embed/c548c766916f4ecebdfb997d55ab5fa0\" frameborder=\"0\" width=\"1108\" height=\"831\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>","height":831,"width":1108,"provider_name":"Loom","provider_url":"https://www.loom.com","thumbnail_height":831,"thumbnail_width":1108,"thumbnail_url":"https://cdn.loom.com/sessions/thumbnails/c548c766916f4ecebdfb997d55ab5fa0-a6a9b2300d59a6c0.gif","duration":637.378,"title":"Agent Evals in Crafting","description":"We demonstrate how to use Crafting for evaluating and iterating on a customer-facing agent. Since agent runs are non-deterministic, we set up multiple parallel agent runs in sandboxes to measure performance changes based on prompt improvements. After establishing a baseline, we open a PR with improvements to automatically trigger another round of evals inside of Crafting.\n\nhttps://github.com/crafting-demo/agent-validation"}