{"type":"video","version":"1.0","html":"<iframe src=\"https://www.loom.com/embed/b3606d1e0c3944f4b9b1e14e6f363fdd\" frameborder=\"0\" width=\"1184\" height=\"888\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>","height":888,"width":1184,"provider_name":"Loom","provider_url":"https://www.loom.com","thumbnail_height":888,"thumbnail_width":1184,"thumbnail_url":"https://cdn.loom.com/sessions/thumbnails/b3606d1e0c3944f4b9b1e14e6f363fdd-160dd8d29577f5b0.gif","duration":447.0603,"title":"Unleashing the Power of VLLM as a Service for AI Development 🚀","description":"In today's demo, I showcased our exciting new feature from Compute by HiveNet: the ability to run VLLM as a service. This allows you to set up a powerful LLM server in just a couple of minutes without the hassle of managing Kubernetes or Docker. We’re using high-performance RTX cards, which are more cost-effective than traditional options like A100s, with a pay-per-second billing model. I demonstrated launching a Falcon 3B model and connecting to it via SSH, highlighting the OpenAI-compatible APIs for seamless integration. I encourage you to reach out with any questions or feedback, as we're here to help!"}