<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/f474a9870635423d9f4b888b11353a7c&quot; frameborder=&quot;0&quot; width=&quot;1920&quot; height=&quot;1440&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1440</height><width>1920</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1440</thumbnail_height><thumbnail_width>1920</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/f474a9870635423d9f4b888b11353a7c-a281f818b986038b.gif</thumbnail_url><duration>501.216667</duration><title>Local AI Query Pipeline with Tools</title><description>This Loom explains the architecture of a local LLM pipeline for research-style question answering with controlled ambiguity handling. Before retrieval or generation, a query writer normalizes the user query without resolving references or ambiguity, then a query analyzer selects one of five actions: clarify, retrieve, tool, direct answer, or refuse. Retrieval uses top 20 candidates followed by a re-ranker that selects the top six, and an evaluator checks whether extracted information is sufficient; otherwise it refuses to avoid unreliable answers. The system runs locally via Olamma with no data leaving the system, and the speaker notes using different models for quality rating (3 to 4B), routing (Gemma 3, 12B), evaluation (Llama 3, 8B), and response generation (3.5 to 9B), with evaluation effectiveness around 70%.</description></oembed>