Knowledge Graph Completion
The task of predicting missing facts in a knowledge graph - inferring relationships between entities that are not yet recorded, based on existing patterns in the graph.
Added May 21, 2026 · 2 min read
No knowledge graph is complete, and manual curation does not scale. KGC is how large-scale knowledge bases become more comprehensive automatically - powering better search, more accurate question answering, and richer recommendations without requiring humans to enter every fact.
Knowledge graphs are large structured databases of facts: entity-relation-entity triples like (Paris, capital-of, France) or (Marie Curie, won, Nobel Prize). They are never complete - there are always facts that exist in the world but have not been entered into the graph. Knowledge graph completion (KGC) is the task of predicting which facts are missing.
The most common approach is embedding-based. Entities and relations are each mapped to vectors in a continuous space, and a scoring function determines how likely a given triple is to be true. TransE, one of the earliest and most influential KGC methods, represents a relation as a translation in embedding space: if (head, relation, tail) is true, then head + relation should approximately equal tail. More sophisticated methods like RotatE, ComplEx, and DistMult use different geometric operations to model different types of relationships.
Graph neural network approaches take a different angle: instead of learning embeddings independently, they use the graph structure itself. A GNN aggregates information from a nodes neighbours to build its representation, then uses those representations to score candidate triples. This makes the approach inherently inductive - it can make predictions for entities not seen during training, as long as they have connections to known entities.
KGC is important for enriching knowledge bases used in question answering, recommendation, and drug discovery. In pharmaceutical research, knowledge graphs connecting genes, proteins, diseases, and drugs can be mined for potential drug candidates - and KGC can suggest which connections the database might be missing.
Analogy
Completing a crossword puzzle using the letters you already have. Each entry constrains the possibilities for intersecting entries. Similarly, existing facts in a knowledge graph constrain what missing facts are plausible - if X works-at Y and Y is-a hospital, then X is probably a healthcare worker.
Real-world example
Wikidata, one of the largest public knowledge graphs, contains hundreds of millions of facts about people, places, and things. KGC models trained on Wikidata can predict facts that are true but missing - like correctly inferring a persons nationality from their place of birth and employer, even when no explicit nationality entry exists.
Why it matters
No knowledge graph is complete, and manual curation does not scale. KGC is how large-scale knowledge bases become more comprehensive automatically - powering better search, more accurate question answering, and richer recommendations without requiring humans to enter every fact.
In the news
No recent coverage - search for Knowledge Graph Completion.
Related concepts
Graph Embedding
The process of mapping nodes, edges, or entire graphs into continuous vector spaces that capture their structural roles and relationships - enabling graph-structured data to be used by machine learning methods that require vector inputs.
Knowledge Graph
A structured representation of real-world entities and their relationships as a directed graph - enabling machines to reason over factual knowledge, answer questions, and make inferences by traversing a web of interconnected facts.
Link Prediction
The graph learning task of predicting whether a connection should exist between two nodes - used to discover unknown relationships, recommend new connections, and complete incomplete knowledge graphs.