{"type":"video","version":"1.0","html":"<iframe src=\"https://www.loom.com/embed/aa14ca5466e84cf7a381dd44fb9e5695\" frameborder=\"0\" width=\"1280\" height=\"960\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>","height":960,"width":1280,"provider_name":"Loom","provider_url":"https://www.loom.com","thumbnail_height":960,"thumbnail_width":1280,"thumbnail_url":"https://cdn.loom.com/sessions/thumbnails/aa14ca5466e84cf7a381dd44fb9e5695-814ddfd19ef74ed9.gif","duration":334.208,"title":"Voice-First Conversational AI with FastAPI","description":"This Loom presents a voice-first conversational AI system built with FastAPI, Grok LLM, and Savarm SpeechAPI, focused on low-friction real-time interaction and orchestration. It shows a modular backend architecture with separate route and service layers for speech to text, LLM interaction, and text to speech, plus a lightweight browser-based frontend. The demo covers interruption handling to make responses feel conversational, and it logs per-stage latencies and total pipeline response time, including an example around 21 seconds end-to-end, with current average latency about 2 to 3 seconds for HTTP and under 1 second for LLM inference and about 2 seconds for DDS generation. It concludes with planned improvements such as real-time streaming, websockets, persistent memory, and deployment infrastructure, and directs viewers to the README on GitHub."}