Local AI Query Pipeline with Tools

video1.0<iframe src="https://www.loom.com/embed/f474a9870635423d9f4b888b11353a7c" frameborder="0" width="1920" height="1440" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>14401920Loomhttps://www.loom.com14401920https://cdn.loom.com/sessions/thumbnails/f474a9870635423d9f4b888b11353a7c-a281f818b986038b.gif501.216667Local AI Query Pipeline with ToolsThis Loom explains the architecture of a local LLM pipeline for research-style question answering with controlled ambiguity handling. Before retrieval or generation, a query writer normalizes the user query without resolving references or ambiguity, then a query analyzer selects one of five actions: clarify, retrieve, tool, direct answer, or refuse. Retrieval uses top 20 candidates followed by a re-ranker that selects the top six, and an evaluator checks whether extracted information is sufficient; otherwise it refuses to avoid unreliable answers. The system runs locally via Olamma with no data leaving the system, and the speaker notes using different models for quality rating (3 to 4B), routing (Gemma 3, 12B), evaluation (Llama 3, 8B), and response generation (3.5 to 9B), with evaluation effectiveness around 70%.