The AI-Driven Path to Precision Therapeutics
By: Rotem Gura-Sadovsky, PhD, Senior Director, Head of Data Strategy at Corundum Systems Biology, Maayan Eilon-Ashkenazy, PhD, Corundum Corporation
For years, advances in biomolecular omics and robotic automation have steadily increased our data-generation capabilities. Now, breakthroughs in AI enable significant progress in drug discovery, conditioned on both generating the right data and deploying AI to the right applications. In this article, Rotem Gura-Sadovsky and Maayan Eilon-Ashkenazy explore the potential for AI to transform drug discovery, charting a path towards more targeted precision therapeutics.

The field of omics, which encompasses high dimensional biomolecular data such as genomics, transcriptomics, proteomics, metabolomics, and microbiome metagenomics, has grown rapidly, both in accessibility to the scientific community and in sensitivity. For example, single-cell RNA sequencing can now examine the gene activity of individual immune cells, showing how T cells, macrophages and B cells react to a drug or stimulus.
At the same time, automated robots have become more accessible and faster, able to perform thousands of tests daily, speeding up lab experiments that used to take weeks.
Biomarker measurements (left) feed into a disease-risk prediction model (center), stratifying patients into low (blue), medium (teal), and high (red) risk groups. Each risk tier then maps to a tailored intervention (right), illustrating how molecular data guide personalized treatment pathways. Created in BioRender. Eilon Ashkenazy, M. (2025) https://BioRender.com/up5m5hj
But the real potential game-changer is AI. While data and automation have not yet revolutionized drug discovery, as has been frequently promised, AI has the power to integrate with these advances and trigger transformative leaps.
The New Era of Self-Supervised AI Models
In the last decade, the application of AI in personalized drug discovery has primarily focused on supervised learning models. These models are typically trained on clinical cohorts, labeled with outcomes such as patient response to a specific drug. While this approach has driven significant progress in personalized cancer diagnostics and treatment, its success has been limited in areas outside of oncology despite intensive attempts for over a decade.
However, the era of self-supervised models is upon us. These models employ vast computational power to extract complex patterns from unlabeled data, which is much more plentiful than labeled data. By applying these models to massive clinical data like omics and medical imaging datasets we can uncover subtle patterns, augment supervised models and advance personalized drug discovery.
Large Language Models (LLMs) are examples of self-supervised models applied to written languages, and they have already given rise to applications that boost drug discovery research. AI coding companions improve computational researchers’ productivity, and natural-language interfaces broaden access to AI tools for non-coders. Using these models, researchers can, for example, generate novel protein sequences simply by describing desired functions. If deployed correctly, this accelerating force can reshape drug discovery, disease-prevention strategies and clinical trial design.
Companies are now working on integration of AI with high-throughput robotic screening platforms. By analyzing early assay readouts in real time and then triggering the next experiment autonomously, these systems can run screens 24/7. This delivers a boost in output, far beyond what robotics alone ever managed. All these lab-side innovations set the stage for the next frontier – analyzing and integrating pre-clinical lab data and patient-derived data at scale.
The Continuous Data Imperative
Today, most datasets are still falling short. Disease-specific cohorts often lack the sample sizes needed for AI training – a 10,000-person cohort sounds large, yet only a few hundred of those individuals may have a specific disease of interest, so the relevant sample size is much smaller. To increase cohort size, multiple cohorts could in principle be combined, but this is in fact very hard, because sample processing and data generation protocols vary across labs, which adds a lot of noise to the data.
Another challenge in clinical cohorts is that the biological samples often lack information about symptoms and medications taken by the patients. This lack of data labeling makes supervised algorithms less useful, as discussed above. Lastly, it is rare to find cohorts with multiple patient touchpoints, such as weekly blood samples that track macro-level health changes alongside molecular data, or trials that obtain paired pre- and post-treatment samples to pinpoint drivers of therapeutic response. These gaps limit the translational impact of even the most advanced algorithms.
Innovators are tackling these gaps. One company has built an at-home blood-collection platform, enabling frequent gene expression profiling without requiring clinical-site visits, which is key to data collection. Another initiative is collecting continuous data on blood glucose, diet, and sleep, along with comprehensive clinical phenotyping and multi-omics. Continuous data is valuable, it is especially compatible with training self-supervised models and enables us to see short-term responses to interventions like diet or medications.
Similarly, federated learning frameworks, which train models across multiple sites without sharing raw data, promise to harmonize distributed biobanks while preserving privacy. AI-driven calibration tools are in development to translate measurements across disparate assay protocols. Finally, synthetic data generation offers a route to augment real-world datasets, balancing regulatory constraints with the need for comprehensive training inputs.
As frequently sampled longitudinal data accumulates over time and becomes widely accessible for research and AI deployment, we will see dramatic advances in our capabilities to predict health outcomes.
Unlocking Our Predictive Capabilities
The next generation of predictive models will integrate richly annotated, longitudinal patient data to detect disease before symptoms appear. Network-scale analyses will map interactions among proteins, RNAs, and metabolites, while simulation tools will forecast individual responses to pharmaceuticals, dietary changes or environmental exposures.
Unsupervised learning (left) discovers natural clusters in unlabeled data. Supervised learning (center) classifies new samples based on labeled examples (gray dot illustrates a query point). Self-supervised learning (right) learns contextual relationships to predict future data points (dashed arrows show trajectory forecasting). Created in BioRender. Eilon Ashkenazy, M. (2025) https://BioRender.com/up5m5hj
Yet, for even wider impact, we need to broaden our perspectives beyond the lab to consider how AI can streamline other critical areas of drug development.
Streamlining Clinical Operations
The high costs associated with drug development can be reduced by integrating AI into clinical development and operations. AI can optimize patient recruitment by fine-tuning inclusion and exclusion criteria to identify suitable participants and personalize patient engagement operations, mitigating two problems: competition for patients for clinical trials and patient attrition. In addition, AI technologies can streamline data reporting processes and timelines, enabling quicker and more accurate analysis of clinical trials.
There is significant potential for AI to advance drug discovery and development. It may boost our predictive capabilities, enabling us to treat diseases more effectively and even prevent them. Yet to make that promise real, the field must break through persistent data bottlenecks. We should prioritize frequent sample collection, acquire longitudinal data from these samples, and generate scientifically sound synthetic data to fill in the gaps. We should also continue introducing AI tools to standardize lab protocols and increase lab automation and streamline clinical trials. Doing this right will enable a future where we predict disease onset and mitigate it early, opening new paradigms of preventative medicine.
About the Authors
Rotem Gura Sadovsky, PhD, is Senior Director, Head of Data Strategy at Corundum Systems Biology (CSB). He is an expert in systems biology, AI and machine learning, and big data analytics. Prior to joining CSB, Dr. Gura Sadovsky served as Director of Product at Enveda Biosciences, where he led a nascent platform utilizing multi-omics data to develop diagnostics. He holds a PhD in Computational Biology from MIT.


Maayan Eilon-Ashkenazy, PhD, is Content Manager at Corundum Corporation. She is an expert in molecular biology, protein engineering, and structure-based drug development. She holds a PhD in Chemical and Structural Biology from the Weizmann Institute of Science.
Total Page Views: 265












