Training
The process of teaching an AI model by adjusting its internal parameters until it gets better at its task - the computational work that creates intelligence.
Added May 21, 2026 · 2 min read
Training is where AI capability comes from. It determines what the model knows, what it can do, and crucially, what it cannot. Understanding training explains why AI systems have knowledge cutoffs (they stop learning when training stops), why they can have biases (from biases in training data), and why producing a capable AI requires so much compute and data.
Training is the core activity in machine learning: the process by which a model becomes useful. It is iterative, compute-intensive, and the results depend critically on the data and choices made during it.
A model begins with randomly initialised parameters - numbers that define its behaviour but start out producing nothing useful. Training changes these numbers. The process works through a feedback loop: the model makes a prediction, the error of that prediction is computed against the correct answer, and the parameters are adjusted slightly in the direction that would have produced a less wrong answer. This adjustment happens via gradient descent. Repeat this across millions or billions of examples and the model's predictions improve.
Training happens across epochs - complete passes through the training dataset - and in batches - small subsets of the data processed together to estimate the gradient efficiently. The learning rate controls how large each adjustment step is: too large and the model overshoots; too small and training takes forever or gets stuck.
Large language models are trained in stages. Pre-training is the longest and most expensive phase: the model is trained on enormous quantities of text (often trillions of words) to predict what comes next. This can require months of compute on thousands of specialised chips, at costs reaching hundreds of millions of dollars. Fine-tuning follows, adjusting the pre-trained model on smaller, curated datasets to produce specific behaviours - like following instructions or declining harmful requests.
Once training is complete, the parameters are frozen. The model is deployed for inference - answering questions, generating text, classifying inputs - without further learning. This distinction between training time and inference time is fundamental to understanding how AI systems work and where their knowledge comes from.
Analogy
How a student learns to write well. At first, their essays are rough. They write, receive feedback, and improve. Over thousands of writing exercises, each with corrective feedback, their style and structure get better. AI training is this loop, run billions of times on a computer, compressing years of learning into days or weeks of computation.
Real-world example
Training GPT-4 required processing trillions of tokens of internet text on thousands of GPUs over many months, estimated to have cost tens of millions of dollars. The result was a model capable of remarkably general language capabilities - all because the training process adjusted billions of parameters to predict text accurately, and in doing so, implicitly learned much of what appears in that text.
Why it matters
Training is where AI capability comes from. It determines what the model knows, what it can do, and crucially, what it cannot. Understanding training explains why AI systems have knowledge cutoffs (they stop learning when training stops), why they can have biases (from biases in training data), and why producing a capable AI requires so much compute and data.
In the news
AI Breakthrough Solves High-Dimensional Data Challenges
arXiv CS.LG · 8h ago
AI Safety Breakthrough: Early Results Show Dramatic Improvement in Model Behavior
LessWrong · 21h ago
Google Maps Meets AI: Exploring Real Places Virtually
The Decoder · 1d ago
AI Could Become Strongly Power-Seeking, According to New Insights
LessWrong · 1d ago
AI Training Just Got a Major Boost with This New Tool
Digg AI, arXiv CS.AI · 1d ago
Related concepts
Gradient Descent
The algorithm that trains neural networks - iteratively adjusting parameters in the direction that reduces the model's error.
Inference
Using a trained AI model to make predictions or generate outputs - the fast, cheap counterpart to training's slow, expensive computation.
Loss Function
The measure of how wrong a model's predictions are - the signal that training uses to decide how to improve.
Machine Learning
A way of teaching computers by showing them examples, rather than writing explicit rules - the engine behind almost everything labelled AI today.
Neural Network
A computing architecture loosely inspired by the brain - layers of interconnected nodes that transform inputs into outputs through learned mathematical operations.
Overfitting
When a model learns its training data too well and fails to generalise - the central challenge of machine learning.
Parameters
The numbers inside a neural network that get adjusted during training and define everything the model knows and can do.