AI Accelerates Drug Discovery: New Model Automates Detection of Developmental Abnormalities in Zebrafish Embryos
Anomaly detection has been a focus of Sivaprasad’s research for quite some time. “In machine learning anomaly detection is the process of identifying data points, events, or patterns that deviate significantly from the expected behavior,” he explains. “During training, the system learns what ´normal looks like, and at inference each sample is scored by how much it deviates from that notion of normal. Unlike traditional classification, which assigns inputs to specific categories (e.g., cat, dog, or car), anomaly detection focuses on distinguishing between ´A´ and ´not A.´” In this latest publication, a similar concept is applied to the biological sciences. “In this case, we applied a version of anomaly detection to observe the development of zebrafish embryos,” the researcher explains.
Zebrafish: A Tiny Powerhouse in Drug Discovery
“Zebrafish are an excellent model organism for biomedical research,” Sivaprasad says. “This is due to their transparent bodies and genetic similarities to humans.” Their rapid development and responsiveness to chemicals make them ideal for high-throughput toxicity screening—an important methodology used in drug discovery. “However, analyzing their development still relies heavily on expert manual inspection—a time-consuming and subjective process.” The challenge here lies in accurately detecting subtle developmental abnormalities that emerge over time in image sequences. “Existing datasets lack both the temporal span and the scale required to train large-scale models,” Sivaprasad adds.
A Breakthrough Dataset and Model
To overcome this bottleneck, Sivaprasad’s colleagues at HIPS first compiled one of the most comprehensive image datasets of zebrafish embryonic development to date, comprising more than 185,000 microscopic images. “They placed zebrafish embryos in wells, monitored them under the microscope, and captured their development continuously,” he explains. The dataset covers two critical experimental tasks:
Images for fertility detection were annotated with sequence-level labels and developmental anomalies had fine-grained temporal annotations, creating a valuable benchmark for developing and testing automated tools.
Transformer-Based AI Boosts Accuracy
The second step was to train a model on this dataset. Sivaprasad trained a new transformer-based neural network that can interpret both the structure of each image and how embryos change over time in the sequences. The AI achieved 98% accuracy in identifying whether an embryo was fertilized and 92% accuracy in detecting developmental abnormalities caused by exposure to a toxic compound (3,4-dichloroaniline). Importantly, the model mimics how human experts analyze developmental progression over time, enabling early predictions of toxicity.
A Platform for Future Innovation
This dataset and model lay the groundwork for future research into early-stage developmental toxicity, improving both sensitivity and prediction speed. “Right now, we evaluate only one chemical to understand how anomalies develop. Our goal is to scale this up to an entire library of chemicals,” Sivaprasad says. The complete dataset will be made freely available on GitHub, allowing other researchers to use and expand upon it at no cost. The aim is to empower both the biomedical and AI research communities to create more advanced, efficient, and ethical toxicity-screening methods. “It’s a valuable resource for the machine learning community to benchmark their methods—and for biomedical research to better understand the effects of different drugs,” Sivaprasad adds.
Funding
This research was supported by the Helmholtz Association (Image-Tox project), the European Lighthouse on Secure and Safe AI (ELSA, grant agreement No. 101070617), and the German Federal Ministry of Education and Research (funding code 16KIS2012).