latentbrief
Back to news
Launch1h ago

Redditor Builds 1-Terabyte Parameter LLM Using Optane PMem as RAM

Hacker News1 min brief

In brief

  • A tech enthusiast has made waves by using Intel's discontinued Optane Persistent Memory (DCPMM) sticks as RAM to run a 1-trillion-parameter large language model locally.
  • The setup, detailed on the Local LLaMA subreddit, uses six used Optane PMem sticks totaling 768GB of memory.
  • Despite Optane's slower performance compared to traditional DRAM, it offers lower latency than NVMe SSDs and was acquired at a fraction of the cost of equivalent DRAM capacity.
  • The build includes an Intel Xeon Gold 6246 CPU, Samsung DDR4 ECC RAM for cache, and a Western Digital NVMe SSD.
  • Running Kimi K2.5, the system achieves about 4 tokens per second using a hybrid GPU/CPU inference method with llama.cpp.
  • While Optane's discontinuation limits this approach's scalability, the creator views it as a successful proof-of-concept.
  • The experiment highlights the potential of repurposing older memory technologies for cutting-edge AI tasks, despite their limitations.

Terms in this brief

Optane PMem
Intel's Optane Persistent Memory (DCPMM) is a type of memory technology that combines the speed of RAM with the durability of storage. It allows for larger memory capacities and persistence, meaning data isn't lost when power is removed. This makes it suitable for high-performance computing tasks like running large language models locally.

Read full story at Hacker News

More briefs