Concept

Neural Tangent Kernel (NTK)

A mathematical framework that reveals how infinitely wide neural networks behave during training - and provides theoretical tools for understanding why neural networks generalise as well as they do.

Added May 18, 2026

Neural networks are notoriously difficult to analyse mathematically. They are non-linear, high-dimensional, and their training dynamics are complex. For decades, theoretical understanding lagged far behind practical results: networks worked much better than theory suggested they should, and practitioners had more intuition than rigorous explanation. The Neural Tangent Kernel, introduced in 2018, provided a new theoretical lens.

The key insight is what happens to a neural network in the limit of infinite width - as the number of neurons in each hidden layer grows without bound. In this limit, something surprising occurs: the network''s behaviour during gradient descent training becomes linear in a particular way, and the learning dynamics can be completely described by a fixed kernel function called the Neural Tangent Kernel. The NTK captures how the network''s output changes in response to a small parameter update at any given point in training.

In the infinite-width limit, the NTK remains constant throughout training - it does not change as the weights update. This means the network learns in a kernel regression regime: it effectively fits a function in a reproducing kernel Hilbert space defined by the NTK. This is a much more tractable mathematical object than a general neural network, and many properties of training can be derived analytically.

The NTK framework explains several puzzling empirical observations. Why do overparameterised networks (networks with far more parameters than training examples) generalise well rather than overfitting? In the NTK regime, overparameterisation is actually beneficial - it enables interpolation of training data while maintaining a bias toward smooth, structured functions. Why does random initialisation produce networks that train reliably? The NTK at initialisation describes what the network can learn, and random initialisation in the right regime produces NTKs with good coverage of function space.

The practical significance of NTK theory is that it provides intuitions that transfer to finite-width real networks. Understanding why infinite networks behave in certain ways illuminates why finite networks behave similarly, and NTK-inspired analysis has guided practical decisions about initialisation, learning rate scaling, and architecture design.',

The framework has limitations: real networks operate far from the infinite-width limit, and the NTK changes during training of finite networks. But as a first-principles theory of learning, NTK provided the field with tools for building on top of, and motivated research into when and why the lazy training regime (where NTK approximation holds) is or is not a good model of real network training.

Analogy

Understanding traffic flow in a city with infinitely wide roads to develop intuitions about what happens in real cities with finite roads. In the infinite-road limit, traffic flow is analytically tractable - you can derive exact equations for how cars move. Real cities have finite roads where the analysis is messier, but the infinite-road intuitions still provide useful guidance.

Real-world example

NTK theory predicted that the learning dynamics of very wide networks should be describable by kernel methods, and this was borne out empirically. Researchers built exact kernel functions derived from NTK theory and found that, for sufficiently wide networks, the NTK kernel''s predictions of which functions a network can learn matched observed training behaviour. This provided some of the first rigorous theoretical grounding for why neural networks generalise.

Why it matters

NTK theory matters because theoretical understanding of deep learning has practical consequences: it informs architecture design, initialisation schemes, and learning rate choices. More broadly, having a mathematical framework for understanding neural network training helps the field move from empirical trial-and-error toward principled design. As AI systems are deployed in higher-stakes environments, theoretical guarantees about their behaviour become increasingly important.

In the news

No recent coverage - check back later.

Related concepts

Contrastive Learning Fine-tuning Manifold Learning

← Back to concepts