<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/50452061e9394ef28d5001b6c9227631&quot; frameborder=&quot;0&quot; width=&quot;1662&quot; height=&quot;1246&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1246</height><width>1662</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1246</thumbnail_height><thumbnail_width>1662</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/50452061e9394ef28d5001b6c9227631-94b36a8604a0c89c.gif</thumbnail_url><duration>230.891</duration><title>AI Engineer Assignment Evaluation Results</title><description>I start by showing that the agent itself was not touched, and that all Sourcetree tests and hard metrics passed. I then open evaluation reports, where I see an 82 percent pass rate and three failing cases, all failing for the same reason related to Marksteps. Next, I run evaluation with filter efficiency and no exteriors using API calls with concurrency 2, max used $5, and it finishes in about 8 seconds, where everything passes. Finally, I change judge validation prompts for Judge V1 and V2 and rerun evaluation, which fails with an answer rejects pattern not matching. No action was requested from viewers.</description></oembed>