Parameters

The numbers inside a neural network that get adjusted during training and define everything the model knows and can do.

Added May 21, 2026 · 2 min read

Parameters are the physical substance of a neural network's knowledge. Understanding what they are clarifies why larger models tend to be more capable (more parameters means more representational capacity), why models require so much memory (all parameters must fit in memory to run), and why model sizes are cited in billions - it is the most direct measure of scale.

A neural network is, at its mathematical core, a function with millions or billions of tunable numbers - its parameters, also called weights. During training, these numbers are adjusted repeatedly to make the model's predictions more accurate. Once training is complete, the parameters are frozen, encoding everything the model has learned into a massive array of floating-point numbers.

Parameters exist at every layer of a neural network. In a standard layer, each connection between nodes has its own weight - a number that determines how much that connection's input influences the next layer. There are also bias terms, additional parameters added to each node's output before activation. The total count of all weights and biases is the model's parameter count.

This count has become a common shorthand for model scale: "a 7 billion parameter model" or "a 70 billion parameter model." Larger parameter counts generally allow more complex patterns to be learned, but require more data, more compute, and more memory to train and run. GPT-3 had 175 billion parameters; estimates for GPT-4 are in the hundreds of billions to over a trillion.

In transformers - the architecture underlying modern large language models - the key parameters are the matrices in the attention and feedforward layers. Multiple large matrices per layer store the model's learned knowledge. It is in these matrices that the model's understanding of language, facts, and reasoning is distributed.

An important and strange fact: no individual parameter encodes a specific fact. The information in a language model is distributed across billions of parameters, with no single parameter representing "Paris is the capital of France." The knowledge is a property of the collective. This distributed representation is what makes neural networks robust - and what makes understanding exactly what they know so difficult.

Analogy

The tuning knobs on a mixing board with billions of dials. Each dial controls one small aspect of the sound. Individually, no knob produces music. Together, when adjusted correctly, they produce the output you want. Training is the process of turning all the dials to the right positions. Once set, the mixing board plays back whatever you feed it.

Real-world example

Llama 3's 70-billion-parameter version requires roughly 140GB of memory to load at standard precision. Running it requires hardware capable of holding that much memory and performing the matrix operations needed for each inference pass. The parameter count is the primary driver of both capability and computational cost.

Why it matters

Parameters are the physical substance of a neural network's knowledge. Understanding what they are clarifies why larger models tend to be more capable (more parameters means more representational capacity), why models require so much memory (all parameters must fit in memory to run), and why model sizes are cited in billions - it is the most direct measure of scale.

In the news

Related concepts

Deep Learning

Machine learning using neural networks with many layers - the approach behind almost every significant AI breakthrough of the past decade.

Gradient Descent

The algorithm that trains neural networks - iteratively adjusting parameters in the direction that reduces the model's error.

Neural Network

A computing architecture loosely inspired by the brain - layers of interconnected nodes that transform inputs into outputs through learned mathematical operations.

Training

The process of teaching an AI model by adjusting its internal parameters until it gets better at its task - the computational work that creates intelligence.

← Back to concepts