Emergent Capabilities
Abilities that appear in large AI models at scale which were not present in smaller versions and were not explicitly trained for - sometimes appearing sharply and without warning.
Added May 21, 2026 · 2 min read
Emergent capabilities complicate safety planning for AI development. If we cannot predict what a more powerful model will be able to do based on smaller versions, it is harder to evaluate risks in advance. This is one argument for investing heavily in interpretability research - to understand what capabilities a model has, not just what it demonstrates on benchmarks.
Emergent capabilities are one of the most striking and puzzling phenomena in modern AI. As language models are scaled up - given more parameters, more compute, and more training data - they do not simply improve gradually on all tasks. Instead, certain capabilities appear to turn on at specific scales, going from near-zero performance to reasonable competence in a narrow range of model sizes.
The original paper documenting this phenomenon at scale (Wei et al., 2022) catalogued over a hundred such abilities: multi-step arithmetic, chain-of-thought reasoning, language translation for rare language pairs, code execution, and many more. Small models scored close to random on these tasks. Beyond a threshold, performance jumps substantially.
Why does this happen? Several explanations have been proposed. One is that some tasks require multiple intermediate steps, and the model only reaches competence when it has enough capacity to handle all steps reliably - failures compound multiplicatively, so capability looks near-zero until the last bottleneck is cleared. Another view is that the appearance of sudden emergence is partly an artefact of metrics: what looks like a sharp threshold in accuracy might be a smooth improvement in an underlying capability when measured differently.
The safety implications are significant. If new capabilities emerge unpredictably at scale, then evaluations of smaller models may not reveal what a larger model can do. A capability that seems absent - including dangerous capabilities - could appear unexpectedly in the next version. This makes it harder to anticipate and prepare for the properties of future systems.
Analogy
Water freezing. You can cool liquid water gradually over a wide temperature range and observe only gradual changes in its behaviour. Then at exactly 0°C, a phase transition occurs and it becomes ice - with fundamentally different properties - even though the underlying physics changed smoothly. Some AI capabilities seem to behave similarly: smooth underlying improvements that produce sharp observable transitions.
Real-world example
GPT-3 could not reliably perform multi-digit arithmetic. GPT-4, trained with significantly more compute, can handle many such problems correctly - a capability that was not explicitly optimised for but emerged from scale. Similarly, chain-of-thought reasoning - where models work through problems step by step - emerged as a useful capability only in large models; prompting smaller models with the same format produces worse results than no chain-of-thought at all.
Why it matters
Emergent capabilities complicate safety planning for AI development. If we cannot predict what a more powerful model will be able to do based on smaller versions, it is harder to evaluate risks in advance. This is one argument for investing heavily in interpretability research - to understand what capabilities a model has, not just what it demonstrates on benchmarks.
In the news
No recent coverage - search for Emergent Capabilities.
Related concepts
Instrumental Convergence
The theoretical observation that almost any AI goal will lead to the same set of sub-goals - like self-preservation and acquiring resources - because these are useful for achieving almost anything.
Mechanistic Interpretability
The field of research that tries to understand what is literally happening inside AI models - tracing computations to find where and how specific knowledge, beliefs, and capabilities are stored and used.
Scalable Oversight
The research challenge of developing methods to reliably supervise AI systems that may be more capable than their human supervisors - ensuring alignment holds even as AI capability grows.