What is AlphaFold anyway?

Macromoltek, Inc.
7 min readDec 24, 2020

--

A machine learning algorithm developed by the super-nerds at Google made headlines recently. New algorithms and machine learning methods are developed across the world daily, however, it is extremely rare for any of them to get mainstream media attention. So why is AlphaFold2 such a big deal? Just because it was developed by Google? Well, no, but we could see why you would think that. The actual reason is that Google’s AlphaFold2 has made a leap of progress towards solving one of the greatest modern problems in structural biology.

So, what is AlphaFold2? Put briefly, AlphaFold2 is a machine learning algorithm that attempts to determine the 3D structure of a protein from its sequence. There is a little technical jargon here, so let’s unpack this a bit: your body is composed of something like 20% protein (and the rest is mostly water). The elastin in your skin which allows it to stretch? That’s a protein. The receptors in your eyes which react to light? Also proteins. The receptors which bind to molecules like caffeine, dopamine, adrenaline, etc? They’re all proteins.

The structure of a protein ultimately determines the role the protein will play in your body. This is because its structure determines what other proteins or small molecules it will interact with. Proteins generally interact in long, domino-like chain reactions that do something very important, such as the chain reaction that converts the light from your screen into synaptic firings in your brain, ultimately birthing ideas and concepts (and hopefully by the end of this, a little humor). Two proteins interact if parts of their surfaces complement one another in both structural and chemical composition. This is commonly referred to as a “lock and key” interaction. Certain proteins are made for each other and will only fit with a certain partner. When they meet and interact, their bond unlocks their true potential, like tiny molecular soulmates.

Considering their many functions, the composition of proteins is actually quite simple. Each protein is essentially a chain, and each link is one of 20 different amino acids. The chain starts out unwound, and through some biochemical gymnastics, is folded into a precise 3D structure. There is a virtually endless number of ways that a single protein can fold up. However, for each protein there is usually only one correct way. Your body actually has whole systems of special proteins that only exist to ensure other proteins fold properly. And that’s because incorrect foldings can be really bad. When they happen to proteins vital for survival, the resulting protein malfunctions can even kill you. Because folding determines protein structure, and protein structure determines protein function. Understanding a protein’s structure gives us insight into what that protein does. And that brings us to what Alphafold2 was designed to do: take the amino-acid sequence of the protein and attempt to predict, out of endless possibilities, the 3D structure the protein chain will fold into.

Now that you understand the problem AlphaFold2 is trying to solve, we can give you an idea of the algorithm’s sophistication, and why it’s an exciting advance. A solution to the protein folding problem has been sought after for more than 50 years. In recent decades, the best of the best have submitted their algorithms to a protein folding competition called CASP. Google first entered into the competition in 2018 with the first version of AlphaFold. The algorithm showed substantial improvement over other methods. However, in the most recent competition, AlphaFold2’s performance put the algorithm in a league of its own. It doesn’t take a scientist to see the huge margins in the charts showing AlphaFold2’s stand-out performance. On the left, we have what is essentially an accuracy metric for determining structure from sequence for the best method each year of the CASP competition since 2006. In the chart below, we have scores for each method submitted in the latest CASP competition. The bar on the far left is AlphaFold2, head and shoulders above the rest. But AlphaFold2 doesn’t just look good compared to the competition; it looks good on it’s own merits alone. Typically, protein 3D structure is determined experimentally in a lab, in a time-consuming, imperfect process with a measurable resolution that tells us how certain we can be about the position of the atoms in the determined structure. (For more information, check out our previous blog discussing X-ray Crystallography . Predictions from AlphaFold2, similarly, have some amount of error. However, unlike many competing structure-prediction methods, the error in these predictions is almost comparable to the relatively small error we expect to see in the lab!.

Okay, so great! It appears that AlphaFold2 may have solved a long sought after problem. What exactly does that mean for us?

Well, proteins can have a variety of uses outside of the human body. However, here at Macromoltek, we are in drug design, so we can stick with that for an example. When we (or anyone else, for that matter) seek to design a drug or therapeutic, we do so with a particular target in mind. In essence, there is a particular protein-interaction pathway (the domino chain-reaction of our previous analogy) that we want to affect. We have already covered how the interactions in these pathways are controlled by the structures of the proteins that take part in them. Now, here is the crazy part: in many cases, we don’t actually know the structures for the proteins in the pathway we are interested in. We may not even know the structure of the individual target protein we want to interact with! Even still, the pharmaceutical industry relentlessly pushes forward, employing all sorts of clever tactics to discover a protein or small molecule which will interact with the pathway in a desirable way. A lot of work goes into finding such a molecule, often despite the absence of most knowledge of the 3D structure of the molecule, or the protein it interacts with. This can be like trying to dock a spaceship with the International Space Station, with only a book about the space station in a foreign language to guide you. Somewhere, in all that helpless confusion, you might find some useful information, but no matter what, you’re still flying blind. Even despite this, hardworking researchers still (in rare cases and after billions of dollars) manage to succeed.

So how, despite this seemingly impossible task do we arrive at new drugs and therapeutics? Most commonly, this is achieved through lot’s of trial and error. The traditional approach involves isolating the target protein in the lab, creating lots of copies of it, then mixing it with other molecules to see if anything sticks. Even when we do find something that sticks, there’s no guarantee that it sticks where we need it to in order to have the effect we are after. Also, especially in the case of small molecules like Aspirin and Tylenol, there’s a good chance something sticks to other proteins we don’t want it to and causes undesirable side effects.

So, how exactly does knowing the structure help with this task? If we know the structure of the target protein we are after, we can much more easily design molecules to bind to that protein where we want or at least use a computer to simulate the trial and error approach. Furthermore, the drug molecule (the one that binds to our target) could also be another protein. In order to design another protein we would essentially be designing it’s sequence. Any change to the sequence could impact the structure, potentially rendering it completely unstable. With a tool like AlphaFold2, we can be much more confident that the protein we design folds in the way we expect before we send it off to the lab for expensive and time consuming tests.

A tool like AlphaFold2 is invaluable in the world of drug design which is why a solution to this protein solving problem has been sought after for several decades. Aside from having a direct practical application, the ability to solve a protein’s structure from it’s sequence can be used as a tool to vastly expand our knowledge of the universe of proteins. Thanks to the genomic revolution and much hard-work and dedication, scientists have managed to determine the amino acid sequences of hundreds of millions of proteins (yeah, there are a lot), but for only a small fraction do we actually have the 3D structures. As mentioned above, we do have methods to experimentally determine these structures. The problem is that these procedures are laborious, error prone and expensive. These unsolved protein sequences are like the pages in a powerful and informative book which up until now we have only been able to translate a fraction of. If algorithms like AlphaFold2 do indeed work as well as they seem to, we may be on the verge of further translating this metaphorical book of knowledge and improving our blueprint of the human biomolecular machine.

Now, with something like AlphaFold2 that is both trendy and considered “revolutionary”, I always like to keep in mind this interesting little idea called the Gartner Hype Cycle. If you’ve never heard of it, this little graphic should be informative. Essentially, when a “revolutionary” technology becomes visible, our expectations for its uses are inflated until we get hit with the real world. However, after realizing our disillusionment, we reach enlightenment, form new visions for the technology and make great things happen.

As you may have guessed, we are probably still somewhere on the hill of inflated expectations. However, the slope of enlightenment is going to be an exciting ride. So, keep your eyes and ears open when you hear something new about this space but still stay safe, the therapies to regenerate the insults to your body from a life-time of bad decisions are still a long way away.

--

--

Macromoltek, Inc.
Macromoltek, Inc.

Written by Macromoltek, Inc.

Welcome to the Macromoltek blog! We're an Austin-based biotech firm focused on using computers to further the discovery and design of antibodies.

No responses yet