<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/0f77fc9251da4c528fa19cd9e81f5d74&quot; frameborder=&quot;0&quot; width=&quot;2446&quot; height=&quot;1834&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1834</height><width>2446</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1834</thumbnail_height><thumbnail_width>2446</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/0f77fc9251da4c528fa19cd9e81f5d74-29d55bb6ce5d9f5f.gif</thumbnail_url><duration>146.763</duration><title>Cloud Build, Mantine, Go, sqlite Benchmarking and Model Evaluation</title><description>I deployed our OpenAI benchmarking app on GCP using Cloud Run, and I sorted the auth issues by refreshing the OpenAI key. I selected a model like GPT 5.4 and ran a standard test, then reviewed metrics like prompt processing rate, time to first token, decode rate, and full total time. I also added more benchmarks, made model selection more robust, and switched to Cloud Code for security checks. In full evaluation mode I ran MMLU Pro with 20 questions and got 80 percent accuracy. No specific viewer action was requested.</description></oembed>