Function Based Protein Hazard Screening Model

video1.0<iframe src="https://www.loom.com/embed/e2efba2ea55e479cb055c586a0e300ca" frameborder="0" width="2214" height="1660" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>16602214Loomhttps://www.loom.com16602214https://cdn.loom.com/sessions/thumbnails/e2efba2ea55e479cb055c586a0e300ca-7823bf88e2cde484.gif259.746Function Based Protein Hazard Screening ModelHey guys, I am Sissi from UC Berkeley, and I presented function based protein hazard screening for DNA screening and synthesis controls at our hackathon. Our ESM embedding classifier achieves 0.996 AUROC under the hardest setting, where no test sequence shares more than 40 percent identity with training, catching 95.7 percent of toxins at a 1 percent false positive rate with a minimal generalization gap. Compared to baselines, ESM2 barely drops on cluster splits. The pipeline is ESM2 mean pooled embeddings into an MLP for a toxin, non toxin confidence score in under a second. I am aiming to integrate this as a secondary screening layer alongside SecureDNA and iBeast comet.