Tutorial:

Metall – A Persistent Memory Allocator for Accelerating Data Analytics

Abstract:

In this tutorial we introduce Metall, a persistent memory allocator designed to provide developers with an API to allocate custom C++ data structures in both block-storage and byte-addressable persistent memories (e.g., NVMe SSD and Intel Optane DC Persistent Memory). Metall relies on a file-backed mmap mechanism to provide applications with transparent access to the data store in persistent memory. Additionally, Metall incorporates state-of-the-art allocation algorithms with the rich C++ interface developed by Boost.Interprocess and provides persistent memory snapshotting (versioning) capabilities.

An often overlooked but common theme among the variety of data analytics platforms is the need to persist data beyond a single process lifecycle. For example, data analytics applications usually perform data ingestion task, which index and partition data with analytics-specific data structures before performing the analysis. However, the data ingestion task is often more expensive than the analytic itself, and the same or derived data is re-ingested frequently (e.g., running multiple analytics to the same data, developing/debugging analytics programs). The promise of persistent memory is that, once constructed, data structures can be re-analyzed and updated beyond the lifetime of a single execution. Thanks to the recent notable performance improvements and cost reductions in non-volatile memory (NVRAM) technology, we believe that leveraging persistent memory in this way brings significant benefits to data analytics applications.

We begin this tutorial by introducing necessary technology for using persistent memory and Metall. Then, we learn about Metall: how to allocate data into PM with Metall, and discuss the internal architecture of Metall. Application case studies using Metall will also be presented. Finally, in a hands-on section, we will stay online to work with anyone wishing to experiment with provided example code. Metall available at https://github.com/LLNL/metall.

Presenters

Keita Iwabuchi

Dr. Keita Iwabuchi is a data scientist in the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory. His research area is distributed systems and parallel computing, particularly in high-performance computing (HPC). Major focuses are HPC-scale graph analytics and system software for persistent memory and non-volatile memory. Dr. Iwabuchi received his Ph.D. in Mathematical and Computing Sciences from Tokyo Institute of Technology in 2017.

Roger Pearce

Dr. Roger Pearce is a computer scientist in the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory. His research interests center around parallel and external memory graph algorithms and data-intensive computing on HPC systems. Roger joined LLNL in 2008 as a Lawrence Scholar, and joined CASC in 2013. Roger received a Ph.D. in Computer Science from Texas A&M University in 2013.

Tutorial:

Metall – A Persistent Memory Allocator for Accelerating Data Analytics

Presenters

Keita Iwabuchi

Roger Pearce

NVMW

CMRR

NVSL

UCSD