Loss Function

The measure of how wrong a model's predictions are - the signal that training uses to decide how to improve.

Added May 21, 2026 · 3 min read

The loss function defines what "better" means during training, making it arguably the most important design choice in building a model. Misspecifying the loss - measuring the wrong thing, or measuring the right thing imprecisely - produces a model that optimises for the wrong outcome. This is why alignment researchers spend so much effort on specifying what good behaviour looks like in terms the training process can optimise.

A neural network begins with random parameters and produces useless outputs. Training improves it. But "improve" needs to be operationalised: how do you measure, precisely, how wrong the model's current predictions are? The loss function is the answer.

The loss function takes the model's output and the correct answer and returns a single number - the loss, sometimes called the cost or error. A high loss means the model is very wrong. A low loss means it is close to correct. The goal of training is to minimise the loss across all examples in the training dataset.

For language models, the most common loss function is cross-entropy loss. The model predicts a probability distribution over all possible next tokens. Cross-entropy measures how much probability the model assigned to the actual next token. If the model confidently predicts "the" and the answer is "the," the loss is very low. If the model thought "cat" was most likely and the answer was "the," the loss is high. Every training step nudges parameters to assign more probability to the correct next token.

The choice of loss function is a design decision that shapes what the model learns to do. Change the loss function and you change what behaviour gets rewarded. Training a language model to be helpful requires not just cross-entropy on correct predictions but additional loss terms that penalise harmful outputs, reward human preferences, or encourage factual accuracy. The training techniques known as RLHF, DPO, and Constitutional AI are all fundamentally ways to modify the effective loss function so that model behaviour aligns with human values.

An important subtlety: a model that achieves very low loss on its training data is not necessarily good. It might be memorising training examples rather than learning general patterns - a problem called overfitting. The loss on a separate validation set is the more meaningful measure of whether the model will work in the real world.

Analogy

A score card. Every time the model makes a prediction, the score card tells it how far off it was - not just right or wrong, but how wrong. Training is the process of making adjustments to improve the score over millions of rounds. The loss function defines the scoring system, and that definition matters enormously - you get what you measure.

Real-world example

When training a text classifier to distinguish positive from negative product reviews, cross-entropy loss penalises the model when it assigns a low probability to the correct class. If the model says "60% positive" for a review that was clearly negative, the loss is high. Repeated adjustment to reduce this loss, across millions of reviews, produces a model that classifies sentiment reliably.

Why it matters

The loss function defines what "better" means during training, making it arguably the most important design choice in building a model. Misspecifying the loss - measuring the wrong thing, or measuring the right thing imprecisely - produces a model that optimises for the wrong outcome. This is why alignment researchers spend so much effort on specifying what good behaviour looks like in terms the training process can optimise.

In the news

No recent coverage - search for Loss Function.

Related concepts

Gradient Descent

The algorithm that trains neural networks - iteratively adjusting parameters in the direction that reduces the model's error.

Machine Learning

A way of teaching computers by showing them examples, rather than writing explicit rules - the engine behind almost everything labelled AI today.

Overfitting

When a model learns its training data too well and fails to generalise - the central challenge of machine learning.

Training

The process of teaching an AI model by adjusting its internal parameters until it gets better at its task - the computational work that creates intelligence.

← Back to concepts