Today's concept

Foundation Model

A large AI model trained on vast amounts of general data, designed to be the starting point for many different applications rather than built for a single task.

For most of AI's history, you built a separate model for each task. One model to translate languages. Another to summarise documents. Another to answer customer service questions. Each required its own data, its own training, its own maintenance. This was expensive and slow, and each model was brittle - good at one thing, useless at everything else.

Foundation models changed this. The idea is to train one very large model on an enormous, diverse dataset - text from books, websites, code, scientific papers, conversations - and develop general capabilities that can then be adapted to almost any task. Instead of starting from scratch for each application, you start from this shared foundation.

The scale of training involved is hard to overstate. Building a frontier foundation model requires months of computation across thousands of specialised chips, costing hundreds of millions of dollars. This is why only a handful of organisations - Anthropic, OpenAI, Google, Meta, a few others - are actually building foundation models. Everyone else is building on top of them.

Once a foundation model exists, it can be adapted in different ways: fine-tuned on specialised data, given access to specific tools, or simply prompted in different ways for different purposes. A single foundation model might power a coding assistant, a customer service bot, a document analysis tool, and a creative writing app - all at the same time, across different companies.

This architecture has concentrated enormous power at a small number of model providers, which is one of the defining tensions in the AI industry today. Access to foundation models - how much they cost, who can use them, what restrictions apply - shapes what AI products are even possible to build.

Analogy

A Swiss Army knife versus a drawer full of single-purpose tools. A foundation model is the Swiss Army knife - broad enough to be adapted for many different jobs. Fine-tuning is like sharpening one of the blades for a specific purpose. The underlying tool is the same; the specialisation makes it better at particular tasks.

Real-world example

When a hospital builds an AI system to assist with reading medical scans, they rarely build a model from zero. They start with a foundation model that already understands language and images in general, then adapt it on medical data. The foundation provides the base capability; the specialised training provides the domain expertise.

Why it matters

The foundation model paradigm is what determines who has structural power in AI. The organisations that can afford to train frontier foundation models set the terms for everything built on top of them - pricing, access, acceptable use, safety constraints. It is a small group, and the gap between them and everyone else is getting wider, not narrower.

In the news

Related concepts

Fine-tuning RLHF (Reinforcement Learning from Human Feedback)Transformer

← Back to concepts