LLM Architectures27

Core

The building blocks inside large language models - how they store knowledge, process text, and generate responses.

BERT

Google's landmark 2018 language model that introduced bidirectional pre-training - a model that reads text in both directions simultaneously and set new standards for understanding tasks.

Byte-Pair Encoding (BPE)

The algorithm most large language models use to split text into tokens - finding the most efficient vocabulary of word fragments that can represent any text without getting overwhelmed by rare words.

Context Window

The maximum amount of text an AI can read and think about at once - everything you send it, plus the conversation history, has to fit within this limit.

All concepts

Context Window Management
The set of strategies for handling inputs that exceed a model's maximum token limit - from sliding windows and summarisation to hierarchical chunking and selective retrieval.

Grouped Query Attention (GQA)
A more memory-efficient variant of multi-head attention where multiple query heads share a single set of key-value pairs - cutting memory use without meaningfully hurting quality.

KV-Cache
A memory buffer that stores the results of attention calculations so the model does not have to recompute them on every generation step - the key to making AI responses fast.

PagedAttention
A memory management technique for AI inference that stores the KV-Cache in non-contiguous memory blocks - the same idea as virtual memory in operating systems, applied to language model serving.

LLM Architectures27

BERT

Byte-Pair Encoding (BPE)

Context Window

Context Window Management

Decoder-Only Architecture

DistilBERT

Embedding Dimension

Embeddings

Encoder-Decoder

Flash Attention

Foundation Model

Grouped Query Attention (GQA)

KV-Cache

Latent Space

Logit Lens

Mixture of Experts (MoE)

Multi-Head Attention

PagedAttention

RAG (Retrieval-Augmented Generation)

RMSNorm (Root Mean Square Layer Normalization)

Rotary Position Embedding (RoPE)

Self-Attention

Sinusoidal Positional Encoding

SwiGLU

Token

Toolformer

Transformer