Demo of nCompass API

video1.0<iframe src="https://www.loom.com/embed/c92f825ac0af4ab18296a16546a75be3" frameborder="0" width="1920" height="1440" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>14401920Loomhttps://www.loom.com14401920https://cdn.loom.com/sessions/thumbnails/c92f825ac0af4ab18296a16546a75be3-9178447daf4d09f4.gif127.6655Demo of nCompass APIHello, Hacker News! In this demo, I showcase the performance benefits of our API at high concurrency rates. Running a high-concurrency workload on both our API and a local-hosted VLM engine, I demonstrate our ability to support a no-rate-limit policy. With a concurrency rate of 10 requests per second, sending 200 requests with input and output tokens, we achieve faster token processing and higher throughput. This video highlights our responsive AI inference engine and cost-effective operations, ensuring a reliable API for production environments.