latentbrief
Back to news
Launch1h ago

Norway's National Library Develops a Sovereign LLM Using Huawei Flash Storage

Hacker News1 min brief

In brief

  • Norway’s National Library is building its own large language model (LLM) to understand the Norwegian language.
  • The project uses 2 petabytes of Huawei OceanStor Dorado flash storage for training data.
  • Marius Husnes, Head of IT Platform at the library, revealed that no commercial LLM provider currently offers a local Norwegian-language model.
    • This puts countries with unique languages at a disadvantage since globally trained English models miss out on local history, culture, and news.
  • The National Library was tasked by Norway’s Ministry of Culture to develop this sovereign AI due to its extensive digital collection of Norwegian books, newspapers, and web content.
  • The library has digitized over 20 PB of data under its legal deposit mandate, stored in a 3-2-1 preservation system (three copies, two media types, one off-site).
    • This unique access gives the library an edge over private companies.
  • The main challenges involve data quality and pipeline throughput, not compute power.
  • The library uses an Nvidia DGX H200 system, a CPU cluster, and Huawei flash arrays for preprocessing.
  • Once ready, data is sent to Norway’s national supercomputer, Sigma2 Olivia, for training.
    • This project highlights the importance of preserving cultural heritage through AI while overcoming technical hurdles to ensure success.

Terms in this brief

Flash Storage
A high-speed storage technology that uses flash memory to store data quickly. It's faster than traditional hard drives and is often used in devices where quick access to data is crucial, like in servers or smartphones.

Read full story at Hacker News

More briefs