{"type":"video","version":"1.0","html":"<iframe src=\"https://www.loom.com/embed/d86866bdfef24a91932369d438edf4de\" frameborder=\"0\" width=\"1922\" height=\"1441\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>","height":1441,"width":1922,"provider_name":"Loom","provider_url":"https://www.loom.com","thumbnail_height":1441,"thumbnail_width":1922,"thumbnail_url":"https://cdn.loom.com/sessions/thumbnails/d86866bdfef24a91932369d438edf4de-1718030249994.gif","duration":345.283,"title":"Architecture of Multimodal Information Retrieval Tool","description":"In this video, I provide a quick walkthrough of the architecture of our multimodal information retrieval tool. I explain how we perform parsing of unstructured text documents and leverage a hierarchical document parsing utility. I also discuss our powerful parsing utility that uses LayoutParser and Detektron2 for object detection. Additionally, I explain how we parse images and generate text descriptions using the OpenAI API. Lastly, I touch on our framework for Dspy and how we interface with the backend and frontend of our application."}