Agent Testing Framework Project Walkthrough

video1.0<iframe src="https://www.loom.com/embed/a19c698d8f4c42289a48cc069e126e48" frameborder="0" width="1920" height="1440" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>14401920Loomhttps://www.loom.com14401920https://cdn.loom.com/sessions/thumbnails/a19c698d8f4c42289a48cc069e126e48-c6c10e2a6bda0e61.gif181.334Agent Testing Framework Project WalkthroughHello, I am Aniket, and I walked you through my Agent Testing Framework project. I booted up the code and explained how main.py calls runner.py to loop through 20 JSON test cases, send each test case to the agent, evaluate performance using LLM scoring plus a small rule base for refusal, length, and prompt injection, then generate a report with pass and fail. In the run, 14 passed and 6 failed, for a 70 percent pass rate, including adversarial cases like requests to reveal hidden policies. There was no action requested from viewers.