<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/b3606d1e0c3944f4b9b1e14e6f363fdd&quot; frameborder=&quot;0&quot; width=&quot;1184&quot; height=&quot;888&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>888</height><width>1184</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>888</thumbnail_height><thumbnail_width>1184</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/b3606d1e0c3944f4b9b1e14e6f363fdd-160dd8d29577f5b0.gif</thumbnail_url><duration>447.0603</duration><title>Unleashing the Power of VLLM as a Service for AI Development 🚀</title><description>In today&apos;s demo, I showcased our exciting new feature from Compute by HiveNet: the ability to run VLLM as a service. This allows you to set up a powerful LLM server in just a couple of minutes without the hassle of managing Kubernetes or Docker. We’re using high-performance RTX cards, which are more cost-effective than traditional options like A100s, with a pay-per-second billing model. I demonstrated launching a Falcon 3B model and connecting to it via SSH, highlighting the OpenAI-compatible APIs for seamless integration. I encourage you to reach out with any questions or feedback, as we&apos;re here to help!</description></oembed>