<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/c92f825ac0af4ab18296a16546a75be3&quot; frameborder=&quot;0&quot; width=&quot;1920&quot; height=&quot;1440&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1440</height><width>1920</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1440</thumbnail_height><thumbnail_width>1920</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/c92f825ac0af4ab18296a16546a75be3-9178447daf4d09f4.gif</thumbnail_url><duration>127.6655</duration><title>Demo of nCompass API</title><description>Hello, Hacker News! In this demo, I showcase the performance benefits of our API at high concurrency rates. Running a high-concurrency workload on both our API and a local-hosted VLM engine, I demonstrate our ability to support a no-rate-limit policy. With a concurrency rate of 10 requests per second, sending 200 requests with input and output tokens, we achieve faster token processing and higher throughput. This video highlights our responsive AI inference engine and cost-effective operations, ensuring a reliable API for production environments.</description></oembed>