Cloud Build, Mantine, Go, sqlite Benchmarking and Model Evaluation

video1.0<iframe src="https://www.loom.com/embed/0f77fc9251da4c528fa19cd9e81f5d74" frameborder="0" width="2446" height="1834" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>18342446Loomhttps://www.loom.com18342446https://cdn.loom.com/sessions/thumbnails/0f77fc9251da4c528fa19cd9e81f5d74-29d55bb6ce5d9f5f.gif146.763Cloud Build, Mantine, Go, sqlite Benchmarking and Model EvaluationI deployed our OpenAI benchmarking app on GCP using Cloud Run, and I sorted the auth issues by refreshing the OpenAI key. I selected a model like GPT 5.4 and ran a standard test, then reviewed metrics like prompt processing rate, time to first token, decode rate, and full total time. I also added more benchmarks, made model selection more robust, and switched to Cloud Code for security checks. In full evaluation mode I ran MMLU Pro with 20 questions and got 80 percent accuracy. No specific viewer action was requested.