CLOUD COMPUTING - Revolutionizing Antibody Discovery: The Role of Cloud Computing


Antibody therapies are an important class of drugs that have exhibited out­standing efficacy and safety in the treat­ment of many major diseases, including cancers, auto-immune, hematological, and infectious diseases, including COVID-19. Considerable progress has been made in the global research and develop­ment of antibody therapies in the past decade.

The antibody market is also the fastest growing class of therapeutics, with 9 of the top 20 best-selling drugs, and is projected to grow from $178.50 billion in 2021 to $451.89 billion in 2028 at a CAGR of 14.1%.1,2 According to the Umabs-DB, 162 antibody therapies have been ap­proved by at least one regulatory agency in the world, including 122 approvals in the US, followed by 114 in Europe, 82 in Japan, and 73 in China. While the US and Europe have led the way in antibody drug discovery for decades, Japan and China have made significant strides in the past decade.3

Traditional methods of antibody dis­covery using random colony screening and low-throughput Sanger sequencing are inefficient and incomplete. Due to the inherent limited sampling in this ap­proach, understanding the complete bind­ing diversity is nearly impossible at a rea­sonable cost, which makes it difficult for many small biotech companies to com­pete. Further, such traditional approaches only capture a fraction of the diversity, lowering the potential to find antibodies with desired specificity or biophysical properties.

The advent of next-generation se­quencing (NGS) and high-throughput computing have revolutionized antibody discovery, making it faster, more efficient, and cost-effective. Cloud computing has further improved this process by reducing overhead costs as well as significantly sim­plifying the process. The following dis­cusses the role of cloud computing in an­tibody discovery, its benefits, and its poten­tial applications in the field of biotechnology.


Antibodies are essential tools in diag­nostics, therapeutics, and research. Anti­body discovery, most often using in vitro (phage or yeast) or in vivo (hybridoma) technologies, is a crucial step to identify leads that bind specifically to target proteins.

The massively parallel sequencing technology known as NGS includes sev­eral high-throughput approaches to DNA sequencing and provides extremely high throughput, scalability, and speed. NGS parallelization of sequencing reactions generates hundreds of megabases to gi­gabases of nucleotide sequence reads in a single instrument run. This has enabled a significant increase in available se­quence data and fundamentally changed genome sequencing approaches in the biomedical sciences.4

Applying NGS to antibody discovery pipelines allows for more comprehensive coverage of output populations at substan­tially reduced costs. Further, advances in high-throughput computing and computa­tional tools have simplified the handling of the large amount of data generated. This not only allows the expansion of diversity from any given selection campaign, but also reduces biases, such as favoring only the most abundant clones. The underlying statistics collected from NGS outputs en­rich datasets, providing information that significantly improves ranking criteria for potential leads. Most importantly, the greater number of unique antibodies iden­tified allows the use of unsupervised clus­tering to obtain diverse clones. This improves screening efficiency by eliminat­ing redundancy arising from screening an­tibodies in the same cluster with similar binding properties.

The advent of NGS, high-throughput computing, and computational tools have elevated the need for cloud computing, before which companies had to rely upon the expertise of highly technical staff sup­ported by computational overhead, mak­ing it challenging for organizations with fewer financial resources. Cloud comput­ing involves the use of remote servers to store, manage, and process data, democ­ratizing access to computational tools and computing power/compute resources across a broad spectrum of companies from small to larger biotech.


The emergence of cloud computing has revolutionized the field of biotechnol­ogy. In the case of antibody discovery, cloud computing enables the storage and analysis of large volume of data, particu­larly DNA sequencing data, leveraging so­phisticated algorithms residing on remote servers, rather than on-premise. This al­lows screening of target antibody popula­tions, derived from large antibody library selections or animal immunization, by rap­idly performing complex calculations on massive amounts of data, making the process more efficient and cost-effective.


Other benefits to cloud computing are that it helps overcome any data sharing limitations, and empowers teams with easy access to software and computational resources. Data sharing, in turn, promotes collaboration and enables researchers to identify, visualize, and prioritize leads across distinct discovery campaigns through a cloud interface in real time.

Finally, as an increasing number of machine learning solutions are applied to the antibody space, the need to scale com­putational power poses a serious chal­lenge. Before cloud computing, users were restricted to in-house hardware, requiring frequent updates and significant support from overhead informatic technical staff. Cloud computing reduces overhead and staffing needs by providing scalable ac­cess to computational power. Cloud-based antibody discovery platforms provide the flexibility to scale up or down depending on the needs of the project. Companies can easily access the computing resources they need to run their discovery programs, without having to invest in expensive hard­ware and infrastructure.


A key element of cloud computing best practices for antibody discovery in­cludes benchmark datasets and validation studies. Benchmark datasets are compiled to compare the performance of a model against an industry standard, while valida­tion studies test the performance of a par­ticular tool against empirical performance metrics. This in turn allows for automating the workflows with tested parameters and simplifies the overall pipeline by uploading the sequencing data and inputting the de­sired number of leads. This creates a “plug and play” to run the calculation on the cloud. The best cloud computing platforms also provide adequate technical support and training, secure and reliable data management, and compatibility for vari­ous computation tools.


Selection campaigns from the Speci­fica Generation 3 antibody platform were conducted against three SARS-CoV-2 tar­gets. Unique NGS sequences identified by the AbXtract module in the cloud-native Orion® Antibody Discovery Suite were compared to random colony screening for the same selection outputs. Antibody se­quences were clustered by unsupervised machine learning, and genes correspon­ding to representative cluster antibodies were synthesized, expressed as IgG and experimentally tested for epitope binning by surface plasmon resonance (SPR).

As expected, this study showed that antibodies within clusters recognized iden­tical epitopes, while antibodies recogniz­ing distinct epitopes belonged to distinct clusters. This study validated Specifica’s bioinformatics pipeline and clustering method to prioritize leads for experiments.


The future of antibody discovery with cloud computing is promising, particularly by enabling the efficient deployment of ar­tificial intelligence (AI) to this field at large scales. AI is composed of a wide variety of subfields, whereby machine learning, par­ticularly deep learning (DL) and natural language processing (NLP), show great promise within the antibody discovery space.

Many recent advances in DL, particu­larly through the increasing use of trans­formers, are enhancing NLP capabilities within the antibody discovery space. Some active areas of AI in the antibody discovery space include: 1) de novo antibody de­sign, 2) developing AI-enhanced library formats, 3) improving the probability of success of lead candidates from in-vitro discovery campaigns, 4) paratope and epitope prediction, 5) structural homology modeling and docking, and 6) identifying and ranking developability issues in anti­bodies.5-10

While many of these applications continue to show real improvements in an­tibody discovery, it is unclear whether de novo design, which holds the greatest po­tential to disrupt the field, will match the hype. For instance, a recent study on de novo design employed generative AI to produce a wide array of leads to a known binder, but still required construction of a library followed by the screening of leads.11 While the employment of AI for li­brary construction is novel here, it begs the question of whether that is an improve­ment to existing capabilities or shows true cost savings over some of the latest discov­ery technologies.12

While we hold high hopes for de novo antibody lead generation, it will be impor­tant for these technologies to show clear improvements in time, cost, diversity, and developability over existing state-of-the-art display technologies.12 One limitation of the de novo effort may be the available datasets used to train AI models, which may fall short of comprehensively captur­ing the latent patterns within protein-pro­tein interaction networks, understanding the biophysics driving these interactions, or the sequence or structural properties im­plicated in developability. Of course, this is not necessarily a limitation in AI, but high-throughput experimental technolo­gies, cross-institutional data silos, and/or in silico modeling (homology models and docking), which may be required to enable these AI models to fulfill its promise, particularly in de novo design.

To overcome this limitation for effective AI training, not only large amounts of data but appropriate processing and storage requirements will be required. Cloud computing offers immediate solutions to these hurdles by allowing ready access to a vast computing network, which enables researchers to scale their needs to process these large datasets at the push of the button. Furthermore, cloud infrastructure allows for the testing of a large array of AI models and the fine-tuning of hyper­parameters in a massively parallel fashion, enabling more automated selection of models and hyperparameters suited to a given task. Stream­lining of these processes is a major advantage of cloud computing, ul­timately facilitating better decisions on lead prioritization and reducing costs.


Cloud computing has transformed the field of antibody discovery, making the process faster, more efficient, and cost-effective. The use of cloud computing tools and algorithms has enabled the rapid screening of large antibody libraries, leading to the identification of high-affinity antibodies with high specificity. Overall, antibody discovery is a complex process and leveraging cloud-based platforms provides a centralized location for researchers to store, share, and analyze data, facilitating collaboration and development of new antibody discovery methods.


  1. Mullard A. FDA approves 100th monoclonal antibody product. Nat Rev Drug Discov. 2021;20(7):491-495. doi: 10.1038/d41573-021-00079-7.
  2. Monoclonal antibody therapy market size, share, & COVID-19 impact analysis, 2021-2028. Fortune Business Insights. Au­gust 2021. Available at:
  3. Antibody Therapy Database; 2023 UmabsDB. Available at:
  4. Tucker T, Marra M, Friedman JM (August 2009). Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet. 2009;85(2):142-154. doi: 10.1016/j.ajhg.2009.06.022.
  5. Hie, B.L., et al. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol (2023).
  6. Shuai, R.W., Ruffolo, J.A., Gray, J.J. Generative language modeling for antibody design. BioRxIV (2022).
  7. Saksena, S.D., et al. Computational counterselection identifies nonspecific therapeutic biologic candidates.
  8. Leem, J. et al. Deciphering the language of antibodies using self-supervised learning. Patterns (2022).
  9. Brennan Abanades et al. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. BioRxIV (2022). 10.1101/2022.11.04.514231v1,
  10. Ameya Harmalkar et al. Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features, mAbs (2023). 10.1080/19420862.2022.2163584.
  11. Shanehsazzadeh, A et al. Unlocking de novo antibody design with generative artificial intelligence. BioRxIV (2023).
  12. Teixeira, A. et al. Drug-like antibodies with high affinity, diversity and developability directly from next generation antibody libraries. mAbs (2021). 10.1080/19420862.2021.1980942.

Dr. Suhani Nagpal is an Application Scientist at OpenEye, Cadence Molecular Sciences in Boston, MA. Leveraging her technical expertise, she provides scientific support for OpenEye’s applications and cloud-based software solutions in both antibody and small molecule drug discovery areas and works on services projects with customers. Prior to joining OpenEye, she earned her PhD in Bioengineering with a focus in computational biophysics at the University of California-Merced, where she explored the conformational rheostat mechanism in protein folding and binding and engineered a protein-based conformational transducer for biosensing.

Dr. Laura Spector is a Bioinformatics Scientist at Specifica, a Q2 Solutions Company in Santa Fe, NM, where she interfaces between the Discovery and Bioinformatics teams to conduct next-generation sequencing and data analysis for antibody discovery campaigns. In developing new AbXtract workflows for OpenEye’s Orion Antibody Discovery Suite and validating them using in-house data, she contributes to tools that accelerate the identification of clonotypes of interest. Prior to joining Specifica, she earned her PhD in Genetics at Stanford University, where she studied genomic factors influencing a virus-mediated gene therapy platform.

Dr. M. Frank Erasmus is Head of Bioinformatics at Specifica, a Q2 Solutions Company in Santa Fe, NM. He has more than 15 years of experience in antibody therapeutics, with several patents and publications contributing to the advancement of the field.