Contrastive Learning

A self-supervised training approach that teaches models to recognise which examples are similar and which are different - without needing human-labelled categories.

Added May 18, 2026 · 3 min read

Contrastive learning is significant because it unlocks self-supervised learning at scale: you can train on unlabelled data by creating your own training signal from the data structure. This allows the field to leverage the enormous amounts of unlabelled text, images, and video on the internet rather than being limited by the much smaller supply of labelled examples.

Most supervised learning requires labelled data: for each input, a human has specified the correct output. Labelling is expensive, time-consuming, and limited by human capacity. Contrastive learning is a powerful alternative that sidesteps this requirement by defining a training signal from the structure of the data itself: similar things should produce similar representations; dissimilar things should produce different ones.

The core contrastive learning loop works as follows. For each example, create two "views" of it - two different augmented or transformed versions that both represent the same underlying content. For images, these might be random crops of the same image. For text, they might be two sentences from the same paragraph, or two paraphrases of the same statement. These two views form a positive pair - things that should be close together in representation space. Negative examples are other items in the batch, which should be far away.

The training objective then pushes positive pairs close together in the model''s representation space and pushes negative pairs apart. The model cannot do this by copying raw inputs (since the two views are different) - it must learn to extract the underlying invariant content that both views share. That extraction process produces rich, meaningful representations that generalise well to downstream tasks.

In language models, contrastive learning has been applied to train powerful embedding models. SimCSE, a particularly influential technique, creates positive pairs by running the same sentence through the model twice with different dropout masks - two views that differ only by which neurons are randomly dropped during each pass. The model learns representations that are consistent across dropout, which turns out to produce excellent sentence embeddings for semantic search and similarity tasks.

Contrastive learning has also influenced preference learning in alignment. The idea of training a model to distinguish preferred from non-preferred outputs - pulling good responses toward the model and pushing bad responses away - shares mathematical structure with contrastive objectives. RLHF''s reward model is in some ways a contrastive learner: it learns to score outputs such that preferred ones score higher than rejected ones.

Analogy

Learning to distinguish good wine from bad by tasting many pairs side by side - always with one excellent and one mediocre example. You never have a written definition of what makes wine good; you just keep comparing. Over many comparisons, your palate develops a reliable sense of what quality tastes like and can evaluate a new wine on its own. Contrastive learning teaches models the same way: through repeated comparison, without explicit category labels.

Real-world example

OpenAI's CLIP model was trained contrastively on 400 million image-text pairs from the internet. The model learned to pull together the representation of an image and its matching caption, while pushing apart mismatched image-caption pairs. The resulting representations generalised so well that CLIP could classify images in completely new categories just by comparing image representations to text descriptions of those categories - without any task-specific training.

Why it matters

Contrastive learning is significant because it unlocks self-supervised learning at scale: you can train on unlabelled data by creating your own training signal from the data structure. This allows the field to leverage the enormous amounts of unlabelled text, images, and video on the internet rather than being limited by the much smaller supply of labelled examples.

In the news

No recent coverage - search for Contrastive Learning.

Related concepts

Embeddings

A way of turning words and sentences into lists of numbers, so that content with similar meanings ends up mathematically close together and can be found by meaning rather than exact wording.

Supervised Fine-Tuning (SFT)

The first step in turning a raw language model into a useful assistant - training it on curated examples of exactly the kind of responses you want it to give.

Transfer Learning

The foundational practice of starting model training from pre-existing knowledge rather than from scratch - the reason fine-tuning a model costs far less than training one.

← Back to concepts