<?xml version="1.0" encoding="UTF-8"?><oembed><type>video</type><version>1.0</version><html>&lt;iframe src=&quot;https://www.loom.com/embed/9268df5cafc74780aee0f5702a89d634&quot; frameborder=&quot;0&quot; width=&quot;1920&quot; height=&quot;1440&quot; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;</html><height>1440</height><width>1920</width><provider_name>Loom</provider_name><provider_url>https://www.loom.com</provider_url><thumbnail_height>1440</thumbnail_height><thumbnail_width>1920</thumbnail_width><thumbnail_url>https://cdn.loom.com/sessions/thumbnails/9268df5cafc74780aee0f5702a89d634-196d6c7e480c21b6.gif</thumbnail_url><duration>224.756</duration><title>Codex Community Hackathon Hyderabad - GRPO Pong LLM Demo</title><description>In this video, I demo a project I worked on using Kolex at the Kolex community hackathon. I utilized Kolex 3.5, a small language model with 0.8 billion parameters, to autonomously improve its performance through reinforcement learning. The model learned to avoid repetitive outputs and achieved significant game rewards, scaling up to 400 chips. I open-sourced the waves on Hacking Face and would love for you to check them out for more insights on the iterations and collapses. I am excited about the future of automation and how these models can assist in specialized tasks.</description></oembed>