Encoder-Decoder

A two-part neural network design where one half reads and compresses your input, and the other half uses that compressed understanding to generate a new output.

Added May 18, 2026 · 2 min read

The encoder-decoder design established the foundation for the entire field of sequence-to-sequence learning. Understanding it is essential for understanding why later architectures - including the decoder-only models that power most current AI assistants - made the design choices they did. Many specialised models for translation, summarisation, and code generation still use this pattern today.

Most tasks in natural language processing require transforming one piece of text into another - translating a sentence from French to English, summarising a long document into a short one, or turning a question into an answer. The encoder-decoder architecture is the classic blueprint for handling this kind of transformation.

The encoder's job is to read the entire input and produce a compact, information-rich summary of its meaning. Think of it as building a mental model: rather than holding every word in memory, it captures the essential meaning in a fixed-size internal representation called a hidden state or context vector. By the time the encoder finishes, it has compressed "the cat sat on the mat" into a numerical structure that captures who did what, and where.

The decoder then takes that compressed meaning and generates the output word by word. At each step, it looks at what it has produced so far and at the encoder's summary to decide what comes next. In a translation task, it produces the target language one token at a time, using the encoder's understanding of the source sentence to guide each choice.

The key strength of this design is the clean separation of concerns: encoding is purely about understanding, decoding is purely about generating. This separation made encoder-decoder models the dominant approach for sequence-to-sequence tasks throughout the late 2010s.

The main limitation is the bottleneck at the boundary between the two halves. The encoder must compress an entire input into a fixed-size representation, and if the input is long and complex, important details can get squeezed out in that compression. The attention mechanism was invented specifically to address this - instead of relying on a single summary vector, the decoder can look back at every individual encoder output, attending to whichever parts are most relevant for the current generation step.

Analogy

A skilled interpreter working in two stages: first they listen to the entire speech in French and build a thorough mental understanding of it, then they deliver that understanding in English. The listening phase is the encoder; the speaking phase is the decoder. The quality of the translation depends on how well the mental model captured the original meaning.

Real-world example

Google Translate spent years running on encoder-decoder transformer models. When you submitted a paragraph in Spanish, the encoder processed the whole thing and built a representation of its meaning, then the decoder generated the English output one word at a time from that representation. This architecture is still widely used for tasks where the input and output are in fundamentally different forms.

Why it matters

The encoder-decoder design established the foundation for the entire field of sequence-to-sequence learning. Understanding it is essential for understanding why later architectures - including the decoder-only models that power most current AI assistants - made the design choices they did. Many specialised models for translation, summarisation, and code generation still use this pattern today.

In the news

No recent coverage - search for Encoder-Decoder.

Related concepts

Decoder-Only Architecture

The design used by GPT, Claude, and most modern AI assistants - a model that generates text by predicting each next word based only on everything that came before it.

Self-Attention

The mechanism that lets every word in a sentence look at every other word simultaneously - the core innovation that makes transformer models understand context so well.

Transformer

The AI architecture that powers virtually every major language model today - the underlying design that makes GPT, Claude, Gemini, and most other modern AI systems work.

← Back to concepts