Macromoltek is excited to be working with NVIDIA to exhibit at this year’s HLTH VRTL conference. In light of this collaboration, we are publishing a special blog which expands on previous posts and discusses the interconnected roles of deep learning and graphical computing — both generally in bioscience, and specifically in antibody development. At Macromoltek, we bring antibody drug development entirely into the realm of computation. In this blog we want to discuss how computers, and especially GPUs, allow us to do that. So of course, the first question we should answer is: “What is an antibody, anyway?”.
What is an Antibody?
Antibodies are an essential component of the adaptive immune system. As we discussed in a previous blog What is an Antibody, they find, target and remember biological threats. Antibody therapeutics are rapidly overtaking small-molecule drugs as promising candidates for a variety of hard to treat diseases. The traditional method for antibody discovery is long and costly. This process can be improved via in silico methods by making it cheaper and faster to get medicine to market. We’ll dive into some of these methods here but for a more in-depth look, specifically into deep learning applications, check out our manuscript.
Recent advances in the field of computer vision have led to novel ways of representing protein structures, which have allowed computers to gain a high-level “understanding” of images and videos. Adopting these novel representations for drug discovery allows scientists to exploit the powerful methods used to solve computer vision problems such as object recognition and classification. Furthermore, proteins are also commonly represented by their amino acids sequence. While less expressive than the structure itself, use of sequence data reaches further back in history and is far more abundant than the rich yet relatively rare structural data. However, such sequence data can be combined with progresses in the field of natural language processing to yield similar advances as seen at the intersection of computer vision and biologics.
Antibody drugs are developed to bind to specific protein targets in the patient as their method of action. Sometimes antibodies simply signal to the patient’s immune system that a diseased cell needs to be destroyed, and in other cases, they can block or activate a biological pathway related to the disease. These target proteins are referred to as an antigen, and the specific area that the antibody therapeutic targets is called the epitope. Identification of the epitope for a candidate antibody drug is an important part of the computational drug design process.
Traditional methods of epitope identification have demonstrated limited accuracy and performance and have seen only marginal improvements. They rely on expert defined descriptors of the protein surface, but the exact forces and molecular characteristics that govern antibody-antigen interactions are beyond our current understanding. Fortunately, the growing number of protein structures deposited in databases such as the Protein Data Bank open this problem up to data-intensive and potentially revolutionary methods such as deep learning, which can learn accurate descriptors to classify the target protein surface. By adopting advances in computer vision data representations such as voxel grids, graphs, and manifolds, and combining them with large protein structure datasets, new deep learning approaches have shown substantial improvement in identifying antibody epitopes.
These same principles can be used when determining the binding strength between two proteins, which is a critical step in the antibody design process. By representing the structures similar to how we’ve previously described (as a manifold, a graph, or a voxel grid) we can use the same methods to extract useful information from these in silico antibody structures.These powerful methods allow us to ascertain the relative change in binding energy due to mutations in order to improve binding energy. Traditional methods to solve this problem (known as affinity maturation) are performed in a lab where proteins and their mutants are subject to rounds of binding assays or tests. This in-lab procedure is both materially expensive and time consuming. Using computational methods, the time to execute this procedure is drastically reduced along with the material cost. Furthermore, computational methods allow us to produce and subsequently narrow down a much larger sample of likely antibody binders.
In silico antibody design can be used to shorten the drug discovery process to days instead of weeks. The addition of machine learning algorithms allows for the subjective hand-selection process used by biochemists to be replaced by objective and reproducible conclusions. However, some machine learning algorithms, i.e deep learning algorithms, are computationally intensive, requiring many calculations and processing cycles and therefore must be performed on computers outfitted with numerous processors. Since most of the operations performed to train and evaluate large neural networks can be broken down into a large number of matrix multiplications, graphics processing units, which are optimized for this task, are used extensively throughout the deep learning community. At Macromoltek, we utilize NVIDIA’s CUDA technology for training our machine learning models to provide insight into the binding behavior of proteins.If you would like to know more, check out NVIDIA’s blog, and stop by our booth at HLTH VRTL and say hello!