Parameters
The numbers inside a neural network that get adjusted during training and define everything the model knows and can do.
Added May 21, 2026 · 2 min read
Parameters are the physical substance of a neural network's knowledge. Understanding what they are clarifies why larger models tend to be more capable (more parameters means more representational capacity), why models require so much memory (all parameters must fit in memory to run), and why model sizes are cited in billions - it is the most direct measure of scale.
A neural network is, at its mathematical core, a function with millions or billions of tunable numbers - its parameters, also called weights. During training, these numbers are adjusted repeatedly to make the model's predictions more accurate. Once training is complete, the parameters are frozen, encoding everything the model has learned into a massive array of floating-point numbers.
Parameters exist at every layer of a neural network. In a standard layer, each connection between nodes has its own weight - a number that determines how much that connection's input influences the next layer. There are also bias terms, additional parameters added to each node's output before activation. The total count of all weights and biases is the model's parameter count.
This count has become a common shorthand for model scale: "a 7 billion parameter model" or "a 70 billion parameter model." Larger parameter counts generally allow more complex patterns to be learned, but require more data, more compute, and more memory to train and run. GPT-3 had 175 billion parameters; estimates for GPT-4 are in the hundreds of billions to over a trillion.
In transformers - the architecture underlying modern large language models - the key parameters are the matrices in the attention and feedforward layers. Multiple large matrices per layer store the model's learned knowledge. It is in these matrices that the model's understanding of language, facts, and reasoning is distributed.
An important and strange fact: no individual parameter encodes a specific fact. The information in a language model is distributed across billions of parameters, with no single parameter representing "Paris is the capital of France." The knowledge is a property of the collective. This distributed representation is what makes neural networks robust - and what makes understanding exactly what they know so difficult.
Analogy
The tuning knobs on a mixing board with billions of dials. Each dial controls one small aspect of the sound. Individually, no knob produces music. Together, when adjusted correctly, they produce the output you want. Training is the process of turning all the dials to the right positions. Once set, the mixing board plays back whatever you feed it.
Real-world example
Llama 3's 70-billion-parameter version requires roughly 140GB of memory to load at standard precision. Running it requires hardware capable of holding that much memory and performing the matrix operations needed for each inference pass. The parameter count is the primary driver of both capability and computational cost.
Why it matters
Parameters are the physical substance of a neural network's knowledge. Understanding what they are clarifies why larger models tend to be more capable (more parameters means more representational capacity), why models require so much memory (all parameters must fit in memory to run), and why model sizes are cited in billions - it is the most direct measure of scale.
In the news
AI Safety Breakthrough: Early Results Show Dramatic Improvement in Model Behavior
LessWrong · 21h ago
AI Breakthrough: New Technique Boosts Model Performance Without Bloating Size
arXiv CS.LG · 1d ago
AI Breakthrough Makes LLMs Faster Without Losing Accuracy
Amazon Science · 5d ago
AI Research Reveals Repulsive Forces Between Similar Features During Learning
arXiv CS.LG · 1w ago
Chrome's AI Features Could Be Taking Up 4GB of Your Device Storage
Hacker News · 1w ago
Related concepts
Deep Learning
Machine learning using neural networks with many layers - the approach behind almost every significant AI breakthrough of the past decade.
Gradient Descent
The algorithm that trains neural networks - iteratively adjusting parameters in the direction that reduces the model's error.
Neural Network
A computing architecture loosely inspired by the brain - layers of interconnected nodes that transform inputs into outputs through learned mathematical operations.
Training
The process of teaching an AI model by adjusting its internal parameters until it gets better at its task - the computational work that creates intelligence.