ARTIFICIAL INTELLIGENCE - The AI-Driven Path to Precision Therapeutics


INTRODUCTION

For years, advances in biomolecular omics and robotic au­tomation have steadily increased our data-generation capabili­ties. Now, breakthroughs in AI enable significant progress in drug discovery, conditioned on both generating the right data and de­ploying AI to the right applications. In this article, Rotem Gura-Sadovsky and Maayan Eilon-Ashkenazy explore the potential for AI to transform drug discovery, charting a path towards more tar­geted precision therapeutics.

The field of omics, which encompasses high dimensional bio­molecular data such as genomics, transcriptomics, proteomics, metabolomics, and microbiome metagenomics, has grown rap­idly, both in accessibility to the scientific community and in sensi­tivity. For example, single-cell RNA sequencing can now examine the gene activity of individual immune cells, showing how T cells, macrophages and B cells react to a drug or stimulus.

At the same time, automated robots have become more ac­cessible and faster, able to perform thousands of tests daily, speeding up lab experiments that used to take weeks.

But the real potential game-changer is AI. While data and automation have not yet revolutionized drug discovery, as has been frequently promised, AI has the power to integrate with these advances and trigger transformative leaps.

THE NEW ERA OF SELF-SUPERVISED AI MODELS

In the last decade, the application of AI in personalized drug discovery has primarily focused on supervised learning models. These models are typically trained on clinical cohorts, labeled with outcomes such as patient response to a specific drug. While this approach has driven significant progress in personalized can­cer diagnostics and treatment, its success has been limited in areas outside of oncology despite intensive attempts for over a decade.

However, the era of self-supervised models is upon us. These models employ vast computational power to extract complex patterns from unlabeled data, which is much more plentiful than labeled data. By ap­plying these models to massive clinical data like omics and medical imaging datasets we can uncover subtle patterns, augment supervised models and advance personalized drug discovery.

Large Language Models (LLMs) are examples of self-supervised models ap­plied to written languages, and they have already given rise to applications that boost drug discovery research. AI coding companions improve computational re­searchers’ productivity, and natural-lan­guage interfaces broaden access to AI tools for non-coders. Using these models, researchers can, for example, generate novel protein sequences simply by describ­ing desired functions. If deployed correctly, this accelerating force can reshape drug discovery, disease-prevention strategies and clinical trial design.

Companies are now working on inte­gration of AI with high-throughput robotic screening platforms. By analyzing early assay readouts in real time and then trig­gering the next experiment autonomously, these systems can run screens 24/7. This delivers a boost in output, far beyond what robotics alone ever managed. All these lab-side innovations set the stage for the next frontier – analyzing and integrating pre-clinical lab data and patient-derived data at scale.

THE CONTINUOUS DATA IMPERATIVE

Today, most datasets are still falling short. Disease-specific cohorts often lack the sample sizes needed for AI training – a 10,000-person cohort sounds large, yet only a few hundred of those individuals may have a specific disease of interest, so the relevant sample size is much smaller. To increase cohort size, multiple cohorts could in principle be combined, but this is in fact very hard, because sample process­ing and data generation protocols vary across labs, which adds a lot of noise to the data.

Another challenge in clinical cohorts is that the biological samples often lack in­formation about symptoms and medica­tions taken by the patients. This lack of data labeling makes supervised algo­rithms less useful, as discussed above. Lastly, it is rare to find cohorts with multiple patient touchpoints, such as weekly blood samples that track macro-level health changes alongside molecular data, or tri­als that obtain paired pre- and post-treat­ment samples to pinpoint drivers of therapeutic response. These gaps limit the translational impact of even the most ad­vanced algorithms.

Innovators are tackling these gaps. One company has built an at-home blood-collection platform, enabling fre­quent gene expression profiling without re­quiring clinical-site visits, which is key to data collection. Another initiative is collect­ing continuous data on blood glucose, diet, and sleep, along with comprehensive clinical phenotyping and multi-omics. Continuous data is valuable, it is especially compatible with training self-supervised models and enables us to see short-term responses to interventions like diet or med­ications.

Similarly, federated learning frame­works, which train models across multiple sites without sharing raw data, promise to harmonize distributed biobanks while pre­serving privacy. AI-driven calibration tools are in development to translate measurements across disparate assay protocols. Fi­nally, synthetic data generation offers a route to augment real-world datasets, bal­ancing regulatory constraints with the need for comprehensive training inputs. As frequently sampled longitudinal data accumulates over time and becomes widely accessible for research and AI de­ployment, we will see dramatic advances in our capabilities to predict health out­comes.

UNLOCKING OUR PREDICTIVE CAPABILITIES

The next generation of predictive models will integrate richly annotated, lon­gitudinal patient data to detect disease be­fore symptoms appear. Network-scale analyses will map interactions among pro­teins, RNAs, and metabolites, while simu­lation tools will forecast individual responses to pharmaceuticals, dietary changes or environmental exposures.

Yet, for even wider impact, we need to broaden our perspectives beyond the lab to consider how AI can streamline other critical areas of drug development.

STREAMLINING CLINICAL OPERATIONS

The high costs associated with drug development can be reduced by integrating AI into clinical development and opera­tions. AI can optimize patient recruitment by fine-tuning inclusion and exclusion criteria to identify suitable participants and person­alize patient engagement operations, miti­gating two problems: competition for patients for clinical trials and patient attri­tion. In addition, AI technologies can streamline data reporting processes and timelines, enabling quicker and more ac­curate analysis of clinical trials.

There is significant potential for AI to advance drug discovery and development. It may boost our predictive capabilities, enabling us to treat diseases more effec­tively and even prevent them. Yet to make that promise real, the field must break through persistent data bottlenecks. We should prioritize frequent sample collec­tion, acquire longitudinal data from these samples, and generate scientifically sound synthetic data to fill in the gaps. We should also continue introducing AI tools to stan­dardize lab protocols and increase lab au­tomation and streamline clinical trials. Doing this right will enable a future where we predict disease onset and mitigate it early, opening new paradigms of preven­tative medicine.

Dr. Rotem Gura Sadovsky is Senior Director, Head of Data Strategy at Corundum Systems Biology (CSB). He is an expert in systems biology, AI and machine learning, and big data analytics. Prior to joining CSB, Dr. Gura Sadovsky served as Director of Product at Enveda Biosciences, where he led a nascent platform utilizing multi-omics data to develop diagnostics. He earned a PhD in Computational Biology from MIT.

Dr. Maayan Eilon-Ashkenazy is Content Manager at Corundum Corporation. She is an expert in molecular biology, protein engineering, and structure-based drug development. She earned a PhD in Chemical and Structural Biology from the Weizmann Institute of Science.