0 · AI FoundationsFoundational

Deep Learning

Machine learning using neural networks with many layers - the approach behind almost every significant AI breakthrough of the past decade.

Added May 21, 2026 · 2 min read

Deep learning is the technology that made the current AI era possible. Before it, AI systems were brittle, narrow, and required enormous human effort to design. Deep learning enabled AI to learn from raw data at scale - which is why progress has accelerated so dramatically since 2012. Nearly every AI capability that seems remarkable today is built on top of deep learning.

The word "deep" in deep learning refers to depth: neural networks with many layers stacked on top of each other. A shallow network might have one or two hidden layers between the input and output. A deep network might have dozens, hundreds, or in the largest modern models, effectively thousands of layers of computation.

Why does depth matter? Depth enables hierarchical abstraction. Each layer in a deep network can learn to detect features that earlier layers missed, building increasingly complex representations from simpler ones. In vision, shallow networks see pixels; deep networks see objects. In language, shallow networks see words; deep networks see meaning, context, and intent.

The possibility of training deep networks had been understood for decades, but it was practically impossible. Networks with many layers suffered from a problem called vanishing gradients: during training, the error signal that flows back through the network to update weights became weaker with each layer it passed through, leaving early layers nearly unable to learn.

Several breakthroughs unlocked deep learning in the early 2010s. Better activation functions (ReLU replaced sigmoid), better weight initialisation, and dramatic increases in compute and data were all important. So was the GPU: graphics processing units turned out to be extraordinarily efficient at the parallel matrix operations that neural networks require, accelerating training by orders of magnitude.

The defining moment came in 2012, when AlexNet - a deep convolutional neural network - won the ImageNet competition by a margin that shocked researchers. Computer vision transformed almost immediately. Text followed with the introduction of transformers in 2017. By the early 2020s, deep learning dominated essentially every area of machine learning research and application.

Analogy

The difference between a one-level sorting warehouse and a multi-floor operation where each floor refines what the floor below delivered. The top floor sorts packages by country. The next by state. The next by city. The final floor delivers to the exact address. Each level builds on the precision achieved below. Deep learning does the same with data: each layer refines the representation built by the previous one.

Real-world example

GPT-4, Claude, and Gemini are all deep learning systems. They use transformers with dozens to hundreds of layers. The depth is what allows them to understand context, maintain coherence across long conversations, and reason about complex topics. The same basic architecture at smaller scale powers the photo-tagging feature in your phone's gallery.

Why it matters

Deep learning is the technology that made the current AI era possible. Before it, AI systems were brittle, narrow, and required enormous human effort to design. Deep learning enabled AI to learn from raw data at scale - which is why progress has accelerated so dramatically since 2012. Nearly every AI capability that seems remarkable today is built on top of deep learning.

In the news

No recent coverage - search for Deep Learning.

Related concepts

Gradient Descent

The algorithm that trains neural networks - iteratively adjusting parameters in the direction that reduces the model's error.

Loss Function

The measure of how wrong a model's predictions are - the signal that training uses to decide how to improve.

Machine Learning

A way of teaching computers by showing them examples, rather than writing explicit rules - the engine behind almost everything labelled AI today.

Neural Network

A computing architecture loosely inspired by the brain - layers of interconnected nodes that transform inputs into outputs through learned mathematical operations.

Parameters

The numbers inside a neural network that get adjusted during training and define everything the model knows and can do.

← Back to concepts