Harnessing Big Data: Databases in Antibody Engineering

Harnessing Big Data: Databases in Antibody Engineering

Technological advancements in recent years, such as next-generation sequencing, have made it possible for large antibody repertoire databases to exist. Databases provide an invaluable resource for researchers worldwide, providing scientists with insights into antibody structure, function, and design. This article will provide an overview and examples for sequence, antibody structure, therapeutic, and experimental databases, and its great potential to accelerate the development of therapeutic antibodies through antibody engineering and discovery.

Sequence Database

Antibody sequences are required in the antibody development and engineering process. This includes antigen binding affinity, target specificity, biological efficacy from epitope analysis, and developability properties. Therefore, database information on antibody sequences and properties are highly informative and can provide training data for artificial intelligence (AI) deep learning models.1

The Observed Antibody Space (OAS) is a sequence database collecting immune repertoires for use in large-scale analysis. It contains approximately 1.5 billion paired variable fragments and unpaired sequences, from over 80 different studies, annotated with predicted sequence errors. They cover diverse immune states, organisms, and individuals.2 Various AI models have used this database to develop humanized antibody sequences.

The International Immunogenetics Information System (IMGT) provides databases for germline antibody sequences and is well-known for integrating sequence, genome, and structural data, particularly gene assignments for recombined antibodies.4,5 

Structure Database 

Antibody structures are important in antibody design, as it determines how the antibody will interact with antigens and its binding properties. Databases give researchers the resources to improve binding affinity and predict epitope and paratopes. 

The Protein Data Bank (PDB) is a database for 3D structures of large biological molecules, including proteins and nucleic acids. It has been used to build up other datasets and integration systems, such as the Antibody Structure Database (AbDb), Structural Antibody Database (SAbDab), and abYsis.1,6

Therapeutic Database 

Databases curating therapeutic antibody information are useful for researchers who are developing therapeutics. TABS is a database offering antibody, antigen, and company data linked to a variety of associated information on clinical trials, patents, papers, news, and regulatory agencies.7

Similarly, the Therapeutic Structural Antibody Database (Thera-SAbDab) describes antibody- and nanobody-derived therapeutics with known sequences recognized by the World Health Organization, including monoclonal antibodies and bispecifics. It also covers structural data from the PDB, and metadata for clinical trials, target antigen specificity, and companies involved in development.8

Experimental Database 

These sequences and structure databases can be further enriched with antibody-specific experimental data. The Immune Epitope Database (IEDB) contains manually curated antibody and T cell epitopes researched in humans and other species, and links to epitope-specific antibody sequences.9 Furthermore, determining antibody-epitope interactions involves binding affinity information, which can be found in SAbDab and PDBBind databases.4,10

At Biointron, we are dedicated to accelerating your antibody discovery, optimization, and production needs. Our team of experts can provide customized solutions that meet your specific research needs. Contact us to learn more about our services and how we can help accelerate your research and drug development projects. 

References:

  1. Kim, J., McFee, M., Fang, Q., Abdin, O., & Kim, P. M. (2023). Computational and artificial intelligence-based methods for antibody development. Trends in Pharmacological Sciences, 44(3), 175–189. https://doi.org/10.1016/J.TIPS.2022.12.005
  2. Olsen, T. H., Boyles, F., & Deane, C. M. (2022). Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science, 31(1), 141–146. https://doi.org/10.1002/PRO.4205
  3. Marks, C., Hummer, A. M., Chin, M., & Deane, C. M. (2021). Humanization of antibodies using a machine learning approach on large-scale repertoire data. Bioinformatics, 37(22), 4041-4047. https://doi.org/10.1093/bioinformatics/btab434
  4. Norman, R. A., Ambrosetti, F., Bonvin, A. M., Colwell, L. J., Kelm, S., Kumar, S., & Krawczyk, K. (2020). Computational approaches to therapeutic antibody design: Established methods and emerging trends. Briefings in Bioinformatics, 21(5), 1549-1567. https://doi.org/10.1093/bib/bbz095
  5. IMGT®, the international ImMunoGeneTics information system®. (2023). IMGT. https://www.imgt.org/
  6. RCSB PDB: Homepage. (2023). RCSB. https://www.rcsb.org/
  7. TABS Therapeutic Antibody Database. (2023). Tabs. https://tabs.craic.com/users/sign_in
  8. SAbDab: The Structural Antibody Database. (2023). Oxford Protein Informatics Group. https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab
  9. IEDB.org: Free epitope database and prediction resource. (2023). IEDB. https://iedb.org/
  10. Welcome to PDBbind-CN database. (2020). PDBbind. http://www.pdbbind.org.cn/
Scroll to Top