Research23h ago

AI Model Growth Slowed by HBM Speed Limits

LessWrongJune 22, 20261 min brief

In brief

The speed of generating tokens in large language models is hindered by the time it takes to read data from memory modules.
For instance, an H100 chip needs 20 milliseconds to fully read its memory stack, while a GB300 requires 36 milliseconds.
- This bottleneck affects how big AI models can grow each year.
Between 2023 and 2031, model sizes are projected to increase from 10 trillion parameters in 2026 to 1 quadrillion by 2031, but this growth is constrained by both hardware limitations and the availability of training data.
By 2031, models would need to be four times larger than what unlimited data would require.
While future chips like Rubin Ultra might offer faster memory speeds, these improvements will still face tough limits due to physical constraints.
As AI technology advances, researchers will likely focus on optimizing model efficiency and exploring new architectures to overcome these hardware boundaries.

Terms in this brief

HBM: High Bandwidth Memory — a type of memory used in AI chips to quickly access large amounts of data. Faster HBM allows models to process information more efficiently, which is crucial for training and running advanced AI systems.
H100: A specific chip designed by NVIDIA for AI computations, known for its high performance in processing large language models. The H100's memory speed impacts how quickly AI models can generate responses, affecting their overall capabilities.
GB300: A memory module used in AI chips, slower than the H100's memory. Its 36-millisecond read time compared to the H100's 20 milliseconds highlights the hardware limitations that constrain AI model growth.

Read full story at LessWrong →

More briefs