How Supercomputers Are Changing Biology

5 min readAug 26, 2021

In a previous blog post, we talked about Molecular Dynamics Simulations (MDS) and their role in the biotechnology industry, such as structure-based drug design. MDS are the closest thing we have to replicate the flexibility and conformational changes proteins (as well as other molecules) undergo when performing their function, such as interacting with other molecules. In order to capture all of the dynamics involved, however, we require very powerful tools that can perform massive amounts of calculations in a feasible time period.

Some say that two heads are better than one. Supercomputers, simply put, are just a cluster of many computers which work together as a single unit by each doing their own tasks to complete an overall project. Each computer is referred to as a node and these nodes are made up of similar components as your desktop or laptop. Since computers can only perform actions one at a time, a supercomputer uses its nodes to perform simultaneous tasks on its large number of processors. Supercomputers are used everywhere: in physics, meteorology, biology, and practically every other field of science. One example would be the National Weather Service’s supercomputer, which takes in observational data from many data-collecting tools (e.g. satellites, weather balloons) to help predict the weather and make accurate forecasts [1].

A supercomputer’s “power” is measured in floating-point operations per second or FLOPS. For context, one of the first supercomputers (CDC 6600 in 1964) was able to perform one million floating-point operations per second or a performance in the megaFLOP range [2]. By the 1990s, we reached gigaFLOP performance (1000x better). Nowadays, the fastest supercomputers can achieve performance of petaFLOPS which, compared to megaFLOPS, is a performance improvement of 1 billion in about 60 years. Fugaku is considered the fastest supercomputer in the world, located in Japan, it has a performance of 442 petaFLOPS. The runner-up, Summit (located in Oak Ridge National Laboratory), can achieve 148 petaFLOPS, about three-times less than Fugaku [3]. Improvements occur every year, in both hardware and software, and many scientists take advantage of these incredible tools.

As an example, consider the millions of base pairs found in DNA. For a specific and rare cancer, the DNA sequences can be analyzed to identify the genetic mutation responsible for the ailment, but it requires lengthy sequence analyses. Cancer researchers at the Human Genome Center (HGC) at The University of Tokyo have worked on identifying genetic mutations on the rarest type of cancers, for which they typically have to analyze around 1,700 tumor samples as well as healthy germline genomes for a grand total of 4,000 samples. Fortunately, the HGC is equipped with SHIROKANE, which can handle accessing and storing large amounts of data while efficiently performing many computations. Researchers working with SHIROKANE have already made strides in identifying mutations for poorly understood cancers, such as liver cancers [4]. Other cancer-related research has been done through the Pan-Cancer Project, where many scientists and clinicians were tasked with creating a resource for primary cancer genomes. Much of the data analysis was done by the Barcelona Computing Center (BSC) by doing analyses such as: the detection of mutations, generation of related computing resources, and identification of non-functional gene copies [5].

Another example can be found at The University of Basel in Switzerland, where researchers searched for indications of “memory molecules.” The goal of this research is to get closer to identifying the molecular basis of memory capacity, in order to be able to create medicines to treat memory disorders, such as dementia. All of this preliminary work wouldn’t have been possible without the university’s supercomputer Piz Daint, which allowed the researchers to compute many functions, including one quadrillion statistical tests [6].

There’s an almost universal tradeoff between speed and generality that even supercomputers must face. While general-purpose supercomputers are highly flexible in what they can do, that flexibility comes with an efficiency cost. Supercomputers like D. E. Shaw’s Anton occupies the other side of this split. It is highly specialized for running molecular dynamics simulations, sacrificing general applicability for a significant gain in speed [7].

What Anton has that other supercomputers don’t is specialized hardware, known as application-specific integrated circuits (ASIC), built specifically for its scope of use, or application. Each of the ASIC cores is split into two subsystems. The first subsystem is designed to quickly calculate the Van der Waals and electrostatic forces that guide its simulations. The second, known as the “flexible” subsystem, is responsible for computing bond energies and fast Fourier transforms for long-range interactions [7].

High-powered computing has recently been making strides in biology. In response to the Covid-19 pandemic, many research groups in computational biology have turned their attention to using supercomputers for the sake of antiviral drug discovery, modeling the emergence of new variants, modeling the interaction between antibodies and the coronavirus’ spike protein, and much more. There was recently a Covid-19 HPC Consortium compiling many of the active projects in this area [8].

The BSC again worked on genome analysis but this time for the coronaviruses and their mutations. Their work, facilitated by the MareNostrum 4 supercomputer, helps in the search for drugs and immune therapies [9]. Scientists take the findings from this genomic research step and run “docking” simulations between potential drugs and the regions of interest on the SARS-CoV-2 proteins, such as the spike protein. These docking simulations take a potential drug and see how well it “fits” against the virus’ proteins [10]. Based on the results, scientists can rule out certain molecules that would make for ineffective therapies. Similar drug screening simulations have been run at The University of Texas’ Stampede 2, where quantum-mechanical based rankings can be assigned to the molecules to estimate their binding affinity against the virus’ proteins [11]. Being the 35th fastest supercomputer in the world, Stampede 2 was capable of performing the daunting calculations needed to produce a short-list of 47 molecules that can be further evaluated; thereby saving time for researchers [3].

Recently, in July of 2021, Cambridge-1 became the UK’s fastest supercomputer and #41 overall with 9.6 petaFLOPS of performance. Made by NVIDIA and specifically built for life sciences and healthcare research. Some of their research will include understanding diseases such as dementia, designing new drugs, and, you guessed it, studying disease-causing mutations in the human genome [12]. The innovation and growth of supercomputers and their biological applications will likely continue and we’re bound to encounter exciting new findings.

References

[1] https://www.weather.gov/about/supercomputers

[2] https://www.britannica.com/technology/CDC-6600

[3] https://www.top500.org/lists/top500/2021/06/

[4] https://media.nature.com/full/nature-cms/uploads/ckeditor/attachments/8423/01_UK_Uni_tokyo.pdf

[5] https://www.hpcwire.com/off-the-wire/bsc-contributes-to-pan-cancer-project/

[6] https://medicalxpress.com/news/2017-10-scientists-supercomputer-memory-molecules.html

[7] https://dl.acm.org/doi/abs/10.1145/1364782.1364802

[8] https://covid19-hpc-consortium.org/projects

[9] https://www.hpcwire.com/off-the-wire/bsc-uses-bioinformatics-ai-and-marenostrum-supercomputer-in-the-fight-against-covid-19/

[10] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7668744/

[11] https://www.tacc.utexas.edu/-/covid-gets-quantum-treatment-for-drug-discovery