latentbrief
← Back to concepts

Concept

Link Prediction

The graph learning task of predicting whether a connection should exist between two nodes - used to discover unknown relationships, recommend new connections, and complete incomplete knowledge graphs.

Added May 18, 2026

Link prediction asks: given a graph with some edges observed, which additional edges are likely to exist? This covers a wide range of practical problems: recommending new friends in a social network (which users are likely to become connected?), completing a knowledge graph (which entity pairs are likely to have an unstated relationship?), predicting protein interactions (which proteins are likely to bind?), and identifying potential drug-disease associations (which compounds might treat which conditions?).

Heuristic methods for link prediction operate on graph topology without learning. Common Neighbours (two nodes that share many mutual connections are likely to connect), Jaccard Coefficient (normalises common neighbours by total neighbours), Adamic-Adar index (weights common neighbours by the inverse log of their degree), and Preferential Attachment (nodes with many connections are more likely to gain new ones) are all simple structural predictors that perform surprisingly well on social networks.

Embedding-based methods learn vector representations for nodes and score candidate links by the geometric relationship between node pairs in embedding space. The simplest scoring function: dot product between node embeddings - nodes with similar embeddings are predicted to connect. More sophisticated scoring functions (Hadamard product, concatenation + MLP, learned bilinear forms) capture asymmetric or typed relationships better.

For knowledge graph link prediction (knowledge graph completion), relation-specific scoring functions encode different relation types differently. TransE scores (head + relation, tail) similarity by L2 distance. RotatE represents each relation as a rotation in complex space. ComplEx operates in the complex number domain to handle asymmetric relations ("A employs B" does not imply "B employs A"). These methods train on observed triples and score unobserved triples at inference time.

GNN-based link prediction runs message passing on the observed graph to produce node embeddings, then scores candidate edges by their node embedding pairs. Subgraph-based methods (SEAL) extract a local subgraph around each candidate edge and use GNNs or heuristics on this subgraph for prediction - capturing richer local structural patterns than node embeddings alone.

Evaluation of link prediction methods uses a held-out set of observed edges (treated as positive examples) paired with randomly sampled non-edges (negative examples). Metrics include AUC, average precision, and the Hits@K (what fraction of true edges rank in the top K among all candidates).

Link prediction is not merely a benchmarking task - it has direct industrial value. LinkedIn's "People You May Know", Facebook's friend suggestions, and Netflix's content-to-user recommendations are all forms of link prediction at massive scale.

Analogy

A mutual acquaintance recommendation. At a party, a friend introduces two people who do not know each other, reasoning that because they share many mutual friends and have similar interests, they are likely to get along. This heuristic - shared social neighbourhood predicts likely connection - is the Common Neighbours heuristic for link prediction made into human social practice. Link prediction algorithms formalise and extend this intuition to arbitrary graph structures with learned embeddings and structural features.

Real-world example

A pharmaceutical company's drug-target interaction graph contains thousands of known drug-protein binding interactions. Many true interactions are undiscovered. A link prediction model (GNN-based) is trained on the known interactions, learning node embeddings for drugs and proteins that capture their binding compatibility. At inference time, the model scores all (drug, protein) pairs not already in the graph and returns the top-1000 most likely undiscovered interactions. Biologists then experimentally test the top predictions. In practice, this approach discovers real novel interactions at rates far above random selection.

Why it matters

Link prediction is one of the most practically impactful tasks in graph machine learning, underpinning recommendation systems, knowledge graph completion, network analysis, and scientific discovery in biology and chemistry. Understanding link prediction provides insight into how platforms like LinkedIn and Facebook generate social recommendations, how knowledge graphs are kept complete, and how AI accelerates drug discovery by predicting molecular interactions.

In the news

No recent coverage - check back later.

Related concepts