Concept

Graph Generative Model

A deep generative model that learns to generate new graphs with realistic structural properties - used to design novel molecules, synthesise diverse network structures, and augment graph datasets for training.

Added May 18, 2026

Generative models have transformed computer vision (generating realistic images with GANs and diffusion models) and NLP (generating coherent text with language models). Graph generative models bring these capabilities to graph-structured data: learning to generate new graphs that look like they were drawn from the same distribution as a training set of graphs.

Molecular generation is the most important application. Given a dataset of drug-like molecules, a graph generative model can sample new molecular graphs - new atom-bond configurations - that have similar structural properties to real drug molecules. Combined with property predictors (molecular GNNs that predict solubility, toxicity, binding affinity), generative models enable inverse drug design: generate candidate molecules, predict their properties, retain promising ones, and iterate - effectively searching chemical space in the direction of desired properties.

The autoregressive approach generates graphs node-by-node and edge-by-edge. GraphRNN (You et al., 2018) uses an RNN to model the sequence of node additions and edge decisions, generating graphs by sequentially adding nodes and deciding their connections to all previously placed nodes. This approach captures complex graph distributions but is slow for large graphs.

Variational Autoencoders (VAEs) for graphs (JTVAE, CVAE-based models) encode a graph into a continuous latent space and decode latent vectors into graphs. The continuous latent space enables interpolation between molecules and gradient-based optimisation: start from a real molecule's latent code, optimise the code to maximise a predicted property, decode the optimised code to a new molecule. This "latent space optimisation" approach has produced novel molecules with improved predicted properties.

Flow-based molecular models (GraphNF, GRF) use normalising flows to learn exact likelihood over graph distributions, enabling principled generation and density estimation. Diffusion models for graphs (DDPM for graphs, DiGress) adapt the diffusion model framework to graph generation, progressively corrupting and denoising molecular graphs. These have achieved the best results on molecular generation benchmarks, producing more diverse and valid molecules than earlier methods.

Graph generation quality is evaluated by: validity (what fraction of generated graphs are chemically valid?), uniqueness (how many generated molecules are distinct from each other?), novelty (what fraction are not in the training set?), and distribution similarity (do generated molecules have similar property distributions to real drug molecules?). The GuacaMol and MOSES benchmarks provide standardised evaluations for molecular generation models.

Analogy

A drug designer's imagination, but systematic and data-driven. A skilled medicinal chemist can propose novel molecular structures by combining structural motifs they have seen work before with creative variations. A graph generative model does the same thing at scale: learning the "vocabulary" of molecular building blocks and their patterns of combination from a database of known molecules, then generating novel combinations that explore the space of viable drug-like structures. The model cannot reason about biochemistry explicitly, but it learns the implicit patterns that distinguish viable from non-viable molecular structures.

Real-world example

Insilico Medicine used a graph generative model as part of their AI drug discovery pipeline for fibrosis. The model generated novel inhibitor candidates for a target protein, guided by predicted binding affinity from a GNN. From millions of generated candidates, filtering by ADMET predictions and binding affinity retained ~100 synthesisable candidates. Wet-lab testing validated several as active inhibitors. A novel compound entered Phase 2 clinical trials in under 4 years from discovery - roughly 3x faster than the typical timeline.

Why it matters

Graph generative models extend the transformative capabilities of generative AI to the domain of molecular design - one of the highest-value applications of machine learning. Drug discovery has traditionally been limited by the combinatorial explosion of chemical space and the high cost of experimental screening. Graph generative models, combined with property predictors, enable systematic exploration of vast regions of chemical space that would never be reached by traditional approaches. Understanding them explains the scientific basis for AI drug discovery pipelines.

In the news

No recent coverage - check back later.

Related concepts

Graph Neural Network Molecular Graph Learning Variational Autoencoder (VAE)

← Back to concepts