Codex Community Hackathon Hyderabad

video1.0<iframe src="https://www.loom.com/embed/9268df5cafc74780aee0f5702a89d634" frameborder="0" width="1920" height="1440" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>14401920Loomhttps://www.loom.com14401920https://cdn.loom.com/sessions/thumbnails/9268df5cafc74780aee0f5702a89d634-196d6c7e480c21b6.gif224.756Codex Community Hackathon Hyderabad - GRPO Pong LLM DemoIn this video, I demo a project I worked on using Kolex at the Kolex community hackathon. I utilized Kolex 3.5, a small language model with 0.8 billion parameters, to autonomously improve its performance through reinforcement learning. The model learned to avoid repetitive outputs and achieved significant game rewards, scaling up to 400 chips. I open-sourced the waves on Hacking Face and would love for you to check them out for more insights on the iterations and collapses. I am excited about the future of automation and how these models can assist in specialized tasks.