Supervised Fine-Tuning (SFT)

The first step in turning a raw language model into a useful assistant - training it on curated examples of exactly the kind of responses you want it to give.

Added May 18, 2026 · 3 min read

SFT is the foundational step that makes language models useful in practice. Without it, even the most capable base models are not reliably helpful. Understanding SFT explains why fine-tuning open models is more accessible than training base models from scratch - the hard, expensive work of learning language is already done, and SFT is a comparatively targeted operation that can be run with far more modest resources.

A freshly pre-trained language model is a strange thing. It has read enormous amounts of text and developed a deep grasp of language, facts, and reasoning. But it has no concept of what it means to be helpful. Ask it a question and it might continue the question rather than answer it, or complete the text in a way that reflects the statistical patterns in its training data rather than what a thoughtful person would say. Supervised fine-tuning is the step that changes this.

SFT works by training the model on a carefully curated dataset of instruction-response pairs. Each example in the dataset shows the model a prompt - a question, a task, an instruction - followed by an ideal response. The model is trained to reproduce those responses, learning to associate the structure of instructions with the structure of appropriate answers. Thousands or tens of thousands of such examples collectively teach the model what being helpful looks like.

The quality and design of the SFT dataset matters enormously. A model trained on low-quality examples will produce low-quality outputs, no matter how capable the underlying base model is. The most carefully engineered SFT datasets cover a wide range of task types, are written by skilled human annotators, and deliberately include examples of how to handle sensitive or edge-case situations. OpenAI, Anthropic, and other labs invest heavily in constructing and curating these datasets.

SFT is typically the first post-training step, and it transforms the model from a text-completion engine into something that behaves like an assistant. But SFT alone is not sufficient to produce the full range of desirable behaviours. It teaches the model what kind of responses to give, but it does not directly optimise for which of many possible good responses is actually best. That is where subsequent techniques like RLHF and DPO come in, building on the foundation SFT establishes.

Smaller organisations and open-source communities frequently perform SFT on top of open base models using publicly available instruction datasets. Projects like Alpaca, Dolly, and OpenHermes demonstrated that relatively modest SFT runs on models like LLaMA could produce surprisingly capable instruction-following models - showing that the base model''s capabilities are the hard part, and the alignment is comparatively accessible.

Analogy

Teaching a highly knowledgeable person how to communicate in a new professional role. The person already has deep expertise - years of reading, studying, and learning. SFT is the onboarding training that shows them specifically what good work looks like in this role: here are examples of excellent customer service calls, excellent legal memos, excellent code reviews. After enough examples, they understand the register, format, and expectations without being told explicitly for every situation.

Real-world example

When Meta released LLaMA as a base model, it was capable of completing text but not of following instructions in the way people expected. Within weeks, researchers applied SFT using the Alpaca dataset - 52,000 instruction-response pairs generated with GPT-4''s help - and produced a model that behaved like a conversational assistant. The same base weights, transformed by a targeted SFT run into something qualitatively more useful.

Why it matters

SFT is the foundational step that makes language models useful in practice. Without it, even the most capable base models are not reliably helpful. Understanding SFT explains why fine-tuning open models is more accessible than training base models from scratch - the hard, expensive work of learning language is already done, and SFT is a comparatively targeted operation that can be run with far more modest resources.

In the news

No recent coverage - search for Supervised Fine-Tuning (SFT).

Related concepts

Direct Preference Optimization (DPO)

A simpler alternative to RLHF that achieves alignment without needing a separate reward model - training the language model directly on human preference pairs.

Fine-tuning

Taking a general-purpose AI model and giving it additional training on a specific subject, so it becomes noticeably better at that particular domain.

Instruction Datasets

Curated collections of instruction-response pairs used to fine-tune language models into helpful assistants - the training data that teaches models what being useful looks like.

RLHF (Reinforcement Learning from Human Feedback)

A training technique that teaches AI to produce responses humans actually prefer, by having real people rate different outputs and using those ratings to improve the model.

← Back to concepts