Protein Structure Prediction
The computational challenge of determining the three-dimensional shape a protein will fold into from its amino acid sequence alone - a problem AI has recently solved at near-experimental accuracy.
Added May 21, 2026 · 2 min read
Protein structure prediction is foundational to drug discovery, vaccine development, and understanding disease mechanisms. AI making this fast and cheap has accelerated biology research across many areas - from basic science to industrial enzyme design - by removing a major bottleneck.
Proteins are chains of amino acids that fold into precise three-dimensional shapes. That shape determines their function: how they bind to other molecules, catalyse reactions, or serve as structural components of cells. For decades, determining a proteins structure required painstaking experimental work using X-ray crystallography or cryo-electron microscopy - techniques that take months and significant resources per protein.
Protein structure prediction - determining the 3D shape from the amino acid sequence alone - was considered one of the hardest problems in computational biology. In 2020, DeepMinds AlphaFold2 achieved near-experimental accuracy on the CASP benchmark, widely considered a breakthrough. The system uses a transformer-based architecture that attends to the full sequence and to a multiple sequence alignment (related sequences from other organisms) to predict the relative positions of each atom in the protein.
The key insight in AlphaFold2 is using evolutionary information: amino acids that are far apart in the sequence but close in space tend to co-evolve - mutations in one are compensated by correlated mutations in the other. These co-evolutionary signals appear in the multiple sequence alignment and inform the spatial predictions.
The practical consequences have been significant. DeepMind released a database of predicted structures for nearly all proteins in UniProt - over 200 million structures - transforming biological research. Scientists can now look up reliable structural predictions for proteins that would have taken years to characterise experimentally.
AlphaFold3 extended the approach to predict interactions between proteins, DNA, RNA, and small molecules - moving from structure to function.
Analogy
Predicting the shape of a paper crane from a folding instruction sheet, without actually folding it. The amino acid sequence is the instruction sheet; the 3D structure is the resulting shape. The challenge is that evolution has compressed millions of years of folding experiments into the sequence, and reading that information requires understanding patterns across all related sequences.
Real-world example
AlphaFold2 was applied to the malaria parasite proteome, predicting structures for proteins that had never been characterised experimentally. This enabled researchers to identify potential drug targets for malaria treatment - proteins with structural features that could be inhibited by small molecules - without waiting years for experimental structure determination.
Why it matters
Protein structure prediction is foundational to drug discovery, vaccine development, and understanding disease mechanisms. AI making this fast and cheap has accelerated biology research across many areas - from basic science to industrial enzyme design - by removing a major bottleneck.
In the news
Related concepts
Graph Neural Network
A class of deep learning models designed to operate directly on graph-structured data - learning representations that capture both the features of individual nodes and the structural relationships between them.
Medical AI
The application of artificial intelligence to healthcare - from diagnosing disease in medical images to predicting patient deterioration to accelerating drug discovery - transforming medicine with data-driven decision support.
Molecular Graph Learning
The application of graph neural networks to molecular chemistry - representing molecules as atom-bond graphs and learning to predict properties like toxicity, solubility, and binding affinity directly from molecular structure.