latentbrief
← Back to concepts

Node Classification

The task of predicting a label or category for each node in a graph, based on the nodes own features and the structure of its connections to other nodes.

Added May 21, 2026 · 2 min read

Node classification turns graphs from descriptive structures into predictive tools. Any domain where data naturally forms a network - social systems, biological systems, financial systems, knowledge systems - can use node classification to label the entities in that network based on collective patterns, not just individual features.

Node classification is one of the three fundamental tasks in graph machine learning, alongside link prediction and graph classification. The goal is to assign each node a label - a category, a score, or a continuous value - by learning from both the nodes own attributes and its position in the graph.

The key insight is that a nodes label often depends not just on what it is, but on who it is connected to. On a social network, a persons political views correlate with their connections. In a citation network, a papers research area correlates with the papers it cites and is cited by. In a fraud detection graph, a suspicious account is more likely to be connected to other suspicious accounts. These patterns are captured by the homophily assumption: connected nodes tend to be similar.

Graph Neural Networks exploit this by iteratively aggregating information from each nodes neighbourhood. After several rounds of aggregation, a nodes representation encodes not just its own features but the collective characteristics of its local neighbourhood. A classifier can then use this enriched representation to predict the nodes label.

Node classification has important practical applications. In biology, it predicts protein function based on protein-protein interaction networks. In e-commerce, it identifies fake product listings based on seller networks. In knowledge graphs, it assigns types to entities based on their relationships. In academic networks, it classifies researchers by field.

The challenge is that node classification in real graphs often involves class imbalance (fraud is rare), heterophily (connected nodes are sometimes dissimilar), and missing labels (most graphs are partially labelled).

Analogy

Predicting someones profession without asking them directly, using only who they know. If most of a persons connections are doctors, they are likely a doctor themselves. Node classification is this intuition made formal and applied at scale across any graph structure.

Real-world example

Twitter/X uses a form of node classification to identify bot accounts. Each account (node) has features like posting frequency and follower ratios. The graph structure - who follows whom, who interacts with whom - provides additional signal. Accounts that cluster with known bots in the interaction graph are more likely to be bots themselves.

Why it matters

Node classification turns graphs from descriptive structures into predictive tools. Any domain where data naturally forms a network - social systems, biological systems, financial systems, knowledge systems - can use node classification to label the entities in that network based on collective patterns, not just individual features.

In the news

Related concepts