Training & Alignment29
CoreHow models are built, fine-tuned, and taught to behave the way people actually want them to.
Bayesian Optimization
A smart method for finding the best hyperparameters for a model - building a probabilistic model of how settings affect performance and using it to choose which settings to try next.
Catastrophic Forgetting
The tendency of neural networks to lose previously learned capabilities when trained on new data - a fundamental challenge in continually updating AI systems.
Constitutional AI
Anthropic's approach to alignment where a model is given a set of principles and trained to critique and revise its own outputs to comply with them - reducing reliance on human labelling of harmful content.
All concepts
C
Contrastive Learning
A self-supervised training approach that teaches models to recognise which examples are similar and which are different - without needing human-labelled categories.
Curriculum Learning
A training strategy that exposes models to simpler examples first and gradually increases difficulty - mimicking how humans learn most effectively.
D
Denoising Objective
A pre-training approach where the model is trained to reconstruct original text from a corrupted version - teaching it to understand both what text means and how it is structured.
Direct Preference Optimization (DPO)
A simpler alternative to RLHF that achieves alignment without needing a separate reward model - training the language model directly on human preference pairs.
Domain Adaptation
Fine-tuning a general model on data from a specific industry or subject area - the step that turns a broadly capable AI into one that speaks the language of your field.
G
Gradient Accumulation
A training trick that simulates large batch sizes on hardware with limited memory - by accumulating gradient updates over multiple small batches before applying them.
Gradient Clipping
A simple but essential training stabiliser that prevents extremely large gradient updates from destabilising a model - one of those small techniques without which training large models would frequently fail.
L
LoRA (Low-Rank Adaptation)
The most widely used technique for efficiently fine-tuning large language models - adapting billions of parameters to new tasks by updating only a tiny fraction of the total weight count.
Lottery Ticket Hypothesis
The finding that large neural networks contain small sub-networks that can be trained to match the full network's performance - suggesting that much of a model's capacity may be redundant.
M
Manifold Learning
The idea that high-dimensional data like text and images actually lies on a much lower-dimensional curved surface - and that AI models succeed partly by learning to navigate this surface.
Mixup Training
A data augmentation technique that creates new training examples by blending pairs of existing examples - teaching models to interpolate smoothly between classes rather than making sharp overconfident decisions.
P
Parameter-Efficient Fine-Tuning (PEFT)
A family of techniques for adapting large language models to specific tasks by updating only a small fraction of their parameters - making fine-tuning accessible without massive compute budgets.
Prompt Engineering
The practice of carefully crafting the instructions you give an AI to get better, more reliable results - it turns out how you ask matters enormously.
R
RLAIF (Reinforcement Learning from AI Feedback)
A variant of RLHF where another AI model provides the preference judgements instead of human raters - dramatically reducing cost while maintaining much of the alignment quality.
RLHF (Reinforcement Learning from Human Feedback)
A training technique that teaches AI to produce responses humans actually prefer, by having real people rate different outputs and using those ratings to improve the model.
S
Sample Packing
A training efficiency technique that concatenates multiple short sequences into a single long training example - eliminating wasted padding and significantly improving GPU utilisation.
Self-Consistency
A prompting technique that generates multiple independent reasoning paths to the same question and selects the answer that appears most often - dramatically improving accuracy on complex reasoning tasks.
Supervised Fine-Tuning (SFT)
The first step in turning a raw language model into a useful assistant - training it on curated examples of exactly the kind of responses you want it to give.
T
Test-Time Compute
Spending more computation during inference - at the moment of answering - to improve quality, rather than only investing compute during training.
Transfer Learning
The foundational practice of starting model training from pre-existing knowledge rather than from scratch - the reason fine-tuning a model costs far less than training one.