{"type":"video","version":"1.0","html":"<iframe src=\"https://www.loom.com/embed/c807fb18d2b046268294acb9fa79f2cb\" frameborder=\"0\" width=\"1920\" height=\"1440\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>","height":1440,"width":1920,"provider_name":"Loom","provider_url":"https://www.loom.com","thumbnail_height":1440,"thumbnail_width":1920,"thumbnail_url":"https://cdn.loom.com/sessions/thumbnails/c807fb18d2b046268294acb9fa79f2cb-506a22ce6bc2318c.gif","duration":156.396,"title":"Local Voice Agent with Whisper and Gemini","description":"Hi I am Rohan and this is my submission for the Mim and Mim Zero ASM. I built a local voice and tool agent using Rock Whisper for speech to text, then Gemini Flash for intent classification at over 96 percent confidence. I demoed two intents by recording audio like, write the python code that summarizes if the number is prime or not, and the transcription was to the mark with intent classification around 18 percent, then tool arguments and loop info were generated and executed in the sandbox. I confirm the interaction to trigger Gemini and then review the output."}