Prediction of Biophysical Properties of Therapeutic Antibodies from Antibody Sequences

Classical machine learning was applied to predict antibody properties from sequences.
Project Duration: 2019-2020


Highlights

  • This was my undergraduate thesis.
  • We employed several classical techniques such as n-gram, n-gapped dipeptides, and PSF, to extract features from antibody variable region sequences.
  • Feature selection techniques, such as SVM-RFE, were deployed and the selected features were used to train several machine learning models.
  • Our models could predict three biophysical properties from only sequences with remarkably high accuracy.

Github Repository
Download Full Text

Abstract

Antibodies have become one of the most predominant class of drugs in modern days. Around the world, at least 570 therapeutic monoclonal antibodies have been studied in clinical trials by commercial companies, and 79 therapeutic monoclonal antibodies have been approved by the United States Food and Drug Administration (US FDA). But a lot of the candidate drugs fail at different stages of development. Some biophysical assays have been proposed for screening candidates at the early stage of development. But this wet lab experimnets are costly and time consuming. Instead several computational/theoretical tools have been developed in the recent years. In this study, we have proposed a machine learning based computational method to predict three important biophysical assays(AC-SINS,HIC retention time and PSR) from antibody sequences only. We have used only the sequence order information as features and used several machine learning techniques to reduce the feature space and predict the target biophysical assay. Our model can predict this biophysical assays with surprisingly high degree of accuracy. The low computational expense and a high accuracy makes our method very feasible for reducing cost of monoclonal antibodies development.