Redditor Builds 1-Terabyte Parameter LLM Using Optane PMem as RAM
In brief
- A tech enthusiast has made waves by using Intel's discontinued Optane Persistent Memory (DCPMM) sticks as RAM to run a 1-trillion-parameter large language model locally.
- The setup, detailed on the Local LLaMA subreddit, uses six used Optane PMem sticks totaling 768GB of memory.
- Despite Optane's slower performance compared to traditional DRAM, it offers lower latency than NVMe SSDs and was acquired at a fraction of the cost of equivalent DRAM capacity.
- The build includes an Intel Xeon Gold 6246 CPU, Samsung DDR4 ECC RAM for cache, and a Western Digital NVMe SSD.
- Running Kimi K2.5, the system achieves about 4 tokens per second using a hybrid GPU/CPU inference method with llama.cpp.
- While Optane's discontinuation limits this approach's scalability, the creator views it as a successful proof-of-concept.
- The experiment highlights the potential of repurposing older memory technologies for cutting-edge AI tasks, despite their limitations.
Terms in this brief
- Optane PMem
- Intel's Optane Persistent Memory (DCPMM) is a type of memory technology that combines the speed of RAM with the durability of storage. It allows for larger memory capacities and persistence, meaning data isn't lost when power is removed. This makes it suitable for high-performance computing tasks like running large language models locally.
Read full story at Hacker News →
More briefs
A Credit-Card-Sized Computer Is Now Possible
A tiny computer the size of a credit card has been developed, featuring an ESP32-C3 chip, an e-paper display, and NFC capabilities. This innovative device can function as a minimalist wallet for QR codes, an ethical hacking tool, a smart home controller, or even a secure crypto wallet. The creator focused on achieving the exact thickness of a real credit card, around 1mm, to ensure it feels authentic. The project highlights the challenges of miniaturization, including shaving down components and ensuring durability without compromising functionality. While the initial prototype is rough, it successfully demonstrates the potential of such technology. The developer plans to launch the project soon, offering updates through their website and social media. This breakthrough could lead to practical applications in security, convenience, and smart technologies, pushing the boundaries of what a credit-card-sized device can do.
Tiny-VLLM: A High-Performance LLM Inference Engine Built with C++ and CUDA
Researchers have introduced tiny-vllm, a lightweight and high-performance inference engine for large language models (LLMs), designed to run on GPUs using C++ and CUDA. This tool offers a comprehensive course and source code for building an LLM inference server from scratch, including features like full forward passes, KV cache, and optimized GPU kernels. The project aims to serve as both a learning resource and a teaching tool, allowing users to experiment with model architecture and implementation details. By focusing on efficiency and speed, tiny-vllm demonstrates how to optimize for single-request decoding, crucial for real-time AI applications like autonomous agents. This advancement highlights the potential to achieve faster inference speeds using standard GPUs, challenging the need for specialized hardware and promoting open-source innovation in AI performance optimization.
AI Generates Battery Electrolyte Recipes
Scientists used AI to generate full battery electrolyte recipes. The AI created complex mixtures of salts, solvents, and additives. The AI generated novel compositions that performed as well as top-of-the-line electrolytes in lithium metal batteries. This is a significant step toward finding electrolytes that outperform the current best. The number of potential molecules for battery electrolytes is estimated to be over 10^60. New batteries may be developed using these recipes.
GitHub Updates Copilot Metrics
GitHub now classifies Copilot users into four adoption phases based on their usage over 28 days. These phases help track how users work with Copilot. Phase 1 users work with code completion. Phase 2 users work with a single agent surface. Phase 3 users work with two or more agent surfaces. This change helps organizations see which features their developers use. They can use this data to improve training and support. GitHub will continue to update these metrics.
Taylor Swift Files Trademark Applications
Taylor Swift's company filed trademark applications for her voice and likeness. This is to stop AI-generated voices and images from misleading people. The goal is to prevent people from thinking she endorsed something she did not. Trademark law protects names and images that help consumers identify products. Taylor Swift will likely take more steps to protect her brand from AI misuse.