<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/aa14ca5466e84cf7a381dd44fb9e5695&quot; frameborder=&quot;0&quot; width=&quot;1280&quot; height=&quot;960&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>960</height><width>1280</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>960</thumbnail_height><thumbnail_width>1280</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/aa14ca5466e84cf7a381dd44fb9e5695-814ddfd19ef74ed9.gif</thumbnail_url><duration>334.208</duration><title>Voice-First Conversational AI with FastAPI</title><description>This Loom presents a voice-first conversational AI system built with FastAPI, Grok LLM, and Savarm SpeechAPI, focused on low-friction real-time interaction and orchestration. It shows a modular backend architecture with separate route and service layers for speech to text, LLM interaction, and text to speech, plus a lightweight browser-based frontend. The demo covers interruption handling to make responses feel conversational, and it logs per-stage latencies and total pipeline response time, including an example around 21 seconds end-to-end, with current average latency about 2 to 3 seconds for HTTP and under 1 second for LLM inference and about 2 seconds for DDS generation. It concludes with planned improvements such as real-time streaming, websockets, persistent memory, and deployment infrastructure, and directs viewers to the README on GitHub.</description></oembed>