Concept

Molecular Graph Learning

The application of graph neural networks to molecular chemistry - representing molecules as atom-bond graphs and learning to predict properties like toxicity, solubility, and binding affinity directly from molecular structure.

Added May 18, 2026

A molecule is naturally a graph: atoms are nodes, chemical bonds are edges. The graph structure encodes the topology of atomic connectivity. Each atom carries features: its element type, formal charge, hybridisation, aromaticity, and whether it is part of a ring. Each bond carries features: bond type (single, double, triple, aromatic), stereochemistry, and whether it is in a ring. This rich attributed graph representation contains all the structural information needed to predict most molecular properties of interest.

Before graph neural networks, molecular machine learning relied on hand-crafted molecular fingerprints: ECFP (Extended Connectivity Fingerprints) use a Morgan algorithm that is essentially a hand-coded message passing procedure, producing a fixed-length bit vector encoding the local structural environments of each atom. These fingerprints, developed by chemists over decades, are competitive with GNNs on many datasets. GNNs can be seen as learnable generalisations of fingerprints: they learn the aggregation function rather than using a fixed hand-coded one.

MPNN (Message Passing Neural Network), introduced by Gilmer et al. at DeepMind in 2017, was the first comprehensive GNN framework applied to molecular property prediction, achieving state-of-the-art results on QM9 (a dataset of quantum mechanical properties for 134,000 small molecules). It demonstrated that end-to-end learning of molecular representations could match or exceed hand-crafted fingerprints.

Since then, graph transformer architectures have extended performance: Graphormer, DimeNet (incorporating geometric 3D information), SchNet (continuous-filter convolutions for 3D atomic positions), and EGNN (E(3)-equivariant GNN) all incorporate 3D molecular geometry alongside 2D graph topology. This is important because many molecular properties depend on the 3D arrangement of atoms, not just their connectivity.

Key applications: virtual screening (scoring millions of candidate drug compounds by predicted binding affinity to a target protein), ADMET prediction (predicting Absorption, Distribution, Metabolism, Excretion, and Toxicity properties essential for drug safety), reaction outcome prediction (predicting products of chemical reactions), retrosynthesis planning (finding synthetic routes to target molecules), and protein-ligand binding affinity prediction.

Large pre-trained molecular GNNs (ChemBERTa, Grover, Uni-Mol) apply the pre-train/fine-tune paradigm to molecular ML: pre-training on tens of millions of unlabelled molecules using self-supervised objectives, then fine-tuning on smaller labelled datasets for specific tasks. These achieve strong results even with very limited labelled training data.

Analogy

A structure-activity relationship (SAR) model in pharmacology. Medicinal chemists know intuitively that molecules with similar structures tend to have similar biological activities. GNNs formalise and scale this intuition: by learning to extract structural features from the molecular graph, they build a mapping from chemical structure to biological activity that generalises across the vast chemical space of possible molecules. The GNN is, in effect, an automated SAR model learned from data rather than hand-crafted by chemists.

Real-world example

Recursion Pharmaceuticals uses GNNs alongside high-content microscopy to predict drug mechanism and toxicity. For a library of 50 million candidate compounds, a molecular GNN rapidly scores each compound's predicted toxicity (Ames test result, hERG inhibition risk, hepatotoxicity) and ADMET properties. Only the top 0.1% of compounds that pass all predicted filters are synthesised and experimentally tested - reducing wet-lab experiments by orders of magnitude and concentrating experimental resources on the most promising candidates.

Why it matters

Drug discovery and materials science face a combinatorial explosion: the space of drug-like molecules is estimated at 10^60. Experimental screening can evaluate at most millions of compounds; GNN-based virtual screening can evaluate billions. Molecular graph learning is what makes computational drug discovery tractable, enabling AI to navigate chemical space far more efficiently than traditional high-throughput screening. Understanding it explains why AI drug discovery has attracted billions in investment and produced several compounds now in clinical trials.

In the news

No recent coverage - check back later.

Related concepts

Graph Neural Network Link Prediction Message Passing

← Back to concepts