Registered users: Check your email for Zoom links to attend. All times are PST.
MONDAY, MAY 16
8:00 am – 9:00 am
Session 1: Memorable Paper Award Finalists I
Chair: Erich Haratsch, Marvell
HolisticGNN: Geometric Deep Learning Engines for Computational SSDs
Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sangwon Lee (KAIST);Myoungsoo Jung (KAIST);
HolisticGNN: Geometric Deep Learning Engines for Computational SSDs
Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sangwon Lee (KAIST);Myoungsoo Jung (KAIST);
Speaker: Miryeong Kwon, KAIST
Abstract: Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges, which exhibit much higher accuracy in a variety of prediction tasks. However, as GNNs are engaged with a large set of graphs and embedding data on storage, they suffer from heavy I/O accesses and irregular computation. We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the data exist in a holistic manner. We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance GPU by 7.1x while reducing energy consumption by 33.2x, on average.
Speaker bio: Miryeong Kwon is a Ph.D. Candidate of Korea Advanced Institute of Science and Technology (KAIST). She is advised by Myongsoo Jung who leads the Computer Architecture, Non-volatile memory, and operating system. Her main research interest is hardware-software co-design for emerging applications and non-volatile and storage device management in that system.
Jaaru: Efficiently model checking persistent memory programs
Hamed Gorjiara (UC Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Jaaru: Efficiently model checking persistent memory programs
Hamed Gorjiara (UC Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Speaker: Hamed Gorjiara, University of California, Irvine
Abstract: Persistent memory (PM) technologies combine near DRAM performance with persistency and open the possibility of using one copy of a data structure as both a working copy and a persistent store of the data. Ensuring that these persistent data structures are crash consistent (i.e., power failures) is a major challenge. Stores to persistent memory are not immediately made persistent --- they initially reside in processor cache and are only written to PM when a flush occurs due to space constraints or explicit flush instructions. It is more challenging to test crash consistency for PM than for disks given the PM's byte-addressability that leads to significantly more states. We present Jaaru, a fully-automated and ultra-efficient model checker for PM programs. Key to Jaaru's efficiency is a new technique based on constraint refinement that can reduce the number of executions that must be explored by many orders of magnitude. This exploration technique effectively leverages commit stores, a common coding pattern, to reduce the model checking complexity from exponential in the length of program executions to quadratic. We have evaluated Jaaru with PMDK and RECIPE, and found 25 persistency bugs, 18 of which are new. Jaaru is also orders of magnitude more efficient than Yat, a model checker that eagerly explores all possible states.
Speaker bio: Hamed Gorjiara is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Irvine (UCI). He currently works at the Programming Languages Research Group advised by Brian Demsky. His research interests are software design, compilers, and testing frameworks. Mainly, his research focuses on developing efficient testing frameworks for persistent memory programs to facilitate the adoption of normal programs on this new type of memory.
NDS: N-Dimensional Storage
Yu-Chia Liu (University of California, Riverside);Hung-Wei Tseng (University of California, Riverside);
NDS: N-Dimensional Storage
Yu-Chia Liu (University of California, Riverside);Hung-Wei Tseng (University of California, Riverside);
Speaker: Yu-Chia Liu, University of California, Riverside
Abstract: Demands for efficient computing among applications that use high-dimensional datasets have led to multi-dimensional computers—computers that leverage heterogeneous processors/accelerators offering various processing models to support multi-dimensional compute kernels. Yet the front-end for these processors/accelerators is inefficient, as memory/storage systems often expose only entrenched linear-space abstractions to an application, and they often ignore the benefits of modern memory/storage systems, such as support for multi-dimensionality through different types of parallel access. This paper presents N-Dimensional Storage (NDS), a novel, multi-dimensional memory/storage system that fulfills the demands of modern hardware accelerators and applications. NDS abstracts memory arrays as native storage that applications can use to describe data locations and uses coordinates in any application-defined multi-dimensional space, thereby avoiding the software overhead associated with data-object transformations. NDS gauges the application demand under- lying memory-device architectures in order to intelligently determine the physical data layout that maximizes access bandwidth and minimizes the overhead of presenting objects for arbitrary applications. This paper demonstrates an efficient architecture in supporting NDS. We evaluate a set of linear/tensor algebra workloads along with graph and data-mining algorithms on custom-built systems using each architecture. Our result shows a 5.73× speedup with appropriate architectural support.
Speaker bio: Yu-Chia Liu is currently a 4th-year Ph.D. student advised by Hung-Wei Tseng at UC Riverside. His research focuses on the interaction between hardware-accelerated programs and storage systems. Yu-Chia's most recent paper, N-Dimensional Storage, is the best paper candidate for MICRO 2021. Currently, Yu-Chia is on the job market seeking a position in the industry.
9:30 am – 10:45 am
9:30 am – 10:45 am
Session 2A: New Memory Devices
Chair: Jishen Zhao, UC San Diego
A definitive demonstration that resistance-switching memories are not memristors
Jinsun Kim (University of South Carolina);Yuriy V. Pershin (University of South Carolina);Ming Yin (Benedict College Columbia, South Carolina);Timir Datta (University of South Carolina);Massimiliano Di Ventra (University of California, San Diego);
A definitive demonstration that resistance-switching memories are not memristors
Jinsun Kim (University of South Carolina);Yuriy V. Pershin (University of South Carolina);Ming Yin (Benedict College Columbia, South Carolina);Timir Datta (University of South Carolina);Massimiliano Di Ventra (University of California, San Diego);
Speaker: Jinsun Kim, University of South Carolina
Abstract: There are claims in the literature that all resistanceswitching memories are memristors, namely, resistors whose resistance depends only on the charge that flows across them. Here, we present the first experimental measurement unambiguously showing that such claims are wrong. Our demonstration is based on the recently suggested “ideal memristor test” which exploits a duality in a capacitor-memristor circuit. This duality requires that for any initial state of the memristor (its initial resistance) and any form of the applied voltage, the final state of the memristor (its final resistance) must be identical to its initial state, if the capacitor charge finally returns to its initial value. We have applied the test to a Cu-SiO2 electrochemical metallization cell, and found that the cell is not a memristor: it does not return to the initial state when the circuit is subjected to a voltage pulse. Since the response of our electrochemical metallization cell is typical of most common bipolar resistance-switching memories, we can conclude that resistance-switching memories are not memristors.
Speaker bio: Jinsun (Jin) Kim is a graduate student working with Dr. Yuriy V. Pershin at University of South Carolina (USC). Her research at USC focuses on two concentrations: (1) Reliability and validity test of resistive switching devices with memory; (2) Nano-fabrication and experimental measurement of resistive switching devices with innovative materials. Jin has recently published work in Advanced Electronic Materials 6(7) and two other research works have been submitted for publication. she is receiving her Master’s degree in physics from USC in May 2022.
Ferroelectric nonvolatile memories: Hafnia Based Ferroelectric Tunnel Junctions
Bhagwati Prasad (Materials Engineering Department, Indian Institute of Science Bengaluru, India 560012);Vishal Thakare (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);Alan Kalitsov (Western Digital Research Center, Western Digital Corporation, San Jose, USA 95119);Zimeng Zhang (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);R Ramesh (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);
Ferroelectric nonvolatile memories: Hafnia Based Ferroelectric Tunnel Junctions
Bhagwati Prasad (Materials Engineering Department, Indian Institute of Science Bengaluru, India 560012);Vishal Thakare (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);Alan Kalitsov (Western Digital Research Center, Western Digital Corporation, San Jose, USA 95119);Zimeng Zhang (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);R Ramesh (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);
Speaker: Bhagwati Prasad, Indian Institute of Science, Bengaluru
Abstract: High density, high-speed and low power consuming nonvolatile memories are currently being vigorously explored for use in next-generation computation, particularly due to the performance gap between the logic and memory elements of the current computational architecture. Electrically switchable spontaneous. polarization of ferroelectric materials enables a robust nonvolatile memory solution. Using ultrathin films of ferroelectric materials as a tunnel barrier in metal/ferroelectric/metal trilayer structure, so-called ferroelectric tunnel junctions (FTJ), is being explored widely as a potential nonvolatile memory element. Unlike ferroelectric RAM (FeRAM), FTJ offers nondestructive readout, in addition to low operation energy and high operation speed. In this work, we have demonstrated FTJs with a very large OFF/ON resistance ratio and relatively low resistance area product (RA) with ~ 1 nm thick Zr doped HfO2 (HZO) ferroelectric tunnel barrier. We stabilized ferroelectricity in ultrathin films of rhombohedral HZO (R-HZO) through the substrate-induced compressive strain. The resistance area product at the bias voltage (~ 300 mV) required for one-half of the zero-bias TER ratio is three orders of magnitude lower than the reported value with relatively thick ferroelectric barriers, which significantly improves signal-to-noise ratio (SNR) during the read operation. These results set the stage for further exploration of Hafnia-based FTJs for non-volatile memory applications.
Speaker bio: Prof. Bhagwati Prasad is an Assistant Professor in the Department of Materials Engineering at the Indian Institute of Science, Bengaluru. Currently, he is working on emerging memory technologies for the Internet of Things (IoT) and Artificial Intelligence (AI). Before joining IISc, he worked as a Principal Research Scientist at Western Digital, San Jose, USA. In the year 2015, he obtained his doctorate degree in Materials Science from the University of Cambridge (UK), and subsequently, he joined a scientist position at Max Planck Institute for Solid State Research, Stuttgart, Germany. In November 2016, Dr. Prasad moved to the USA and joined Prof R Ramesh’s group at the University of California, Berkeley as a senior postdoctoral researcher. Dr. Prasad has published more than 30 research articles in highly reputed journals and filed more than 35 Patents, including 30 US Patents.
A Joint Sneak Path and Data Detection Scheme for Resistive Random Access Memories
Guanghui Song (Xidian University);Kui Cai (Singapore University of Technology and Design);Xingwei Zhong (Singapore University of Technology and Design);Ying Li (Xidian University);
A Joint Sneak Path and Data Detection Scheme for Resistive Random Access Memories
Guanghui Song (Xidian University);Kui Cai (Singapore University of Technology and Design);Xingwei Zhong (Singapore University of Technology and Design);Ying Li (Xidian University);
Speaker: Guanghui Song, Xidian University
Abstract: Resistive random-access memory is one of the most promising candidates for the next generation of non-volatile memory technology. However, its crossbar array structure causes severe ``sneak-path" interference, which also leads to strong inter-cell correlation. Recent works have mainly focused on sub-optimal data detection schemes by ignoring inter-cell correlation and assuming sneak-path interference is independent between different array cells. In this abstract, we propose a near-optimal data detection scheme that can approach the performance bound of the optimal detection scheme. Our detection scheme leverages a joint data and sneak-path interference recovery and can use all inter-cell correlations. The proposed scheme is suitable for data detection of large memory arrays with only linear operation complexity.
Speaker bio: He received the Ph.D. degree in the department of intelligent information engineering and sciences, Doshisha University, Kyoto, Japan, in 2012. He was a postdoctoral research fellow in Singapore University of Technology and Design, Singapore. Currently, he is an Associate Professor with Xidian University, Xi'an, China. His research interests are in the areas of channel coding theory, multi-user coding, and coding for data storage systems.
RRAM-ECC: Improving Reliability of RRAM-Based Compute In-Memory
Zishen Wan (Georgia Tech);Brian Crafton (Georgia Tech);Samuel Spetalnick (Georgia Tech);Jong-Hyeok Yoon (DGIST);Arijit Raychowdhury (Georgia Tech);
RRAM-ECC: Improving Reliability of RRAM-Based Compute In-Memory
Zishen Wan (Georgia Tech);Brian Crafton (Georgia Tech);Samuel Spetalnick (Georgia Tech);Jong-Hyeok Yoon (DGIST);Arijit Raychowdhury (Georgia Tech);
Speaker: Zishen Wan, Georgia Institute of Technology
Abstract: Compute in-memory (CIM) is an exciting technique that minimizes data transport, maximizes memory throughput, and performs computation on the bitline of memory sub-arrays. This is especially interesting for machine learning applications, where increased memory bandwidth and analog domain computation offer improved area and energy efficiency. Unfortunately, CIM faces new challenges traditional CMOS architectures have avoided. In this work, we explore the impact of device variation (calibrated with measured data on foundry RRAM arrays) and propose a new class of error correcting codes (ECC) for hard and soft errors in CIM. We demonstrate single, double, and triple error correction offering up to 16,000$\times$ reduction in bit error rate over a design without ECC and over 427$\times$ over prior work, while consuming only 29.1\% area and 26.3\% power overhead.
Speaker bio: Zishen Wan is a Ph.D. student at Georgia Tech, advised by Prof. Arijit Raychowdhury at Integrated Circuits and System Research Lab. His research interests are hardware accelerators for domain-specific applications, in-memory computing, and hardware reliability. Before coming to Georgia Tech, Zishen obtained his M.S from Harvard University in 2020, and B.S from Harbin Institute of Technology in 2018. Zishen has received the Best Paper Award in DAC 2020 and CAL 2020, and was selected as DAC Young Fellow in 2021.
Session 2B: Heteregenous Memory Systems
Chair: Hung-Wei Tseng, UC Riverside
HAMS: Hardware Automated Memory-over-Storage for Large-scale Memory Expansion
Jie Zhang (Peking University);Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sungjoon Koh (KAIST);Nam Sung Kim (UIUC);Mahmut Taylan Kandemir (Penn State, USA);Myoungsoo Jung (KAIST);
HAMS: Hardware Automated Memory-over-Storage for Large-scale Memory Expansion
Jie Zhang (Peking University);Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sungjoon Koh (KAIST);Nam Sung Kim (UIUC);Mahmut Taylan Kandemir (Penn State, USA);Myoungsoo Jung (KAIST);
Speaker: Jie Zhang, Peking University
Abstract: Large persistent memories such as NVDIMM have been perceived as a disruptive memory technology, because they can maintain the state of a system even after a power failure and allow the system to recover quickly. However, the existing persistent memories either suffer from the poor performance or are constrained by poor scaling. One may leverage the existing OS memory management to construct a large persistent memory space by hybriding NVDIMM and SSD. Unfortunately, overheads incurred by a heavy softwarestack intervention seriously negate the benefits of such designs. Tackling the aforementioned limitations, we propose HAMS, a hardware automated Memory-over-Storage (MoS) solution. Specifically, HAMS aggregates the capacity of NVDIMM and ultra-low latency flash archives (ULL-Flash) into a single large memory space (cf. Figure 1), which can be used as a working memory expansion or persistent memory expansion, in an OS-transparent manner. HAMS resides in the memory controller hub and manages its MoS address pool over conventional DDR and NVMe interfaces; it employs a simple hardware cache to serve all the memory requests from the host MMU after mapping the storage space of ULL-Flash to the memory space of NVDIMM. Second, to make HAMS more energyefficient and reliable, we propose an “advanced HAMS” which removes unnecessary data transfers between NVDIMM and ULLFlash after optimizing the datapath and hardware modules of HAMS. This approach unleashes the ULL-Flash and its NVMe controller from the storage box and directly connects the HAMS datapath to NVDIMM over the conventional DDR4 interface. Our evaluations show that HAMS and advanced HAMS can offer 97% and 119% higher system performance than a software-based NVDIMM design, while costing 41% and 45% lower energy, respectively.
Speaker bio: Dr. Jie Zhang is currently a tenure-track assistant professor in the school of Computer Science at Peking University, China. His research interests are storage system, emerging non-volatile memory and heterogeneous computing with work spanning from computer architecture, embedded system and operating system. His recent publications investigate in resolving the memory wall issue in the Von Neuman Architecture and the data migration issue in the GPU-SSD heterogeneous computing system. So far, he has published over 40 papers in the leading international conferences and journals, including ISCA, MICRO, HPCA, OSDI, FAST, PACT and DAC. His research has been recognized with Boya Young Scholar, KAIST Breakthroughs 50th Innoversary, Nominated as Memorable Paper Award (NVMW), Best Representation Award (KCC) and over 60 news headlines. For more details, please visit his personal website: https://jiezhang-camel.github.io/.
Integrating New Photonic-Based Heterogeneous Memory into Throughput Accelerators
Jie Zhang (Peking University);Myoungsoo Jung (KAIST);
Integrating New Photonic-Based Heterogeneous Memory into Throughput Accelerators
Jie Zhang (Peking University);Myoungsoo Jung (KAIST);
Speaker: Jie Zhang, Peking University
Abstract: Graphics processing units (GPUs) have been widely adopted as an efficient accelerator hardware platform to speed up the execution of large-scale data-intensive applications. While massively parallel computing power of a GPU can enhance data processing bandwidth, its memory system is difficult to satisfy increasing I/O demands of the large-scale applications. Specifically, DRAM faces many practical challenges to scale their technology down, and it cannot be denser due to memory retention time violations, insufficient sensing margins and low reliability issues. To address these challenges, we propose Ohm-GPU, a new optical network based heterogeneous memory design for GPUs. Specifically, Ohm-GPU can expand the memory capacity by combing a set of high-density 3D XPoint and DRAM modules as heterogeneous memory. To prevent memory channels from throttling throughput of GPU memory system, Ohm-GPU replaces the electrical lanes in the traditional memory channel with a high-performance optical network. However, the hybrid memory can introduce frequent data migrations between DRAM and 3D XPoint, which can unfortunately occupy the memory channel and increase the optical network traffic. To prevent the intensive data migrations from blocking normal memory services, Ohm-GPU revises the existing memory controller and designs a new optical network infrastructure, which enables the memory channel to serve the data migrations and memory requests in parallel. Our evaluation results reveal that Ohm-GPU can improve the performance by 27%, compared to the baseline optical network based heterogeneous memory system.
Speaker bio: Dr. Jie Zhang is currently a tenure-track assistant professor in the school of Computer Science at Peking University, China. His research interests are storage system, emerging non-volatile memory and heterogeneous computing with work spanning from computer architecture, embedded system and operating system. His recent publications investigate in resolving the memory wall issue in the Von Neuman Architecture and the data migration issue in the GPU-SSD heterogeneous computing system. So far, he has published over 40 papers in the leading international conferences and journals, including ISCA, MICRO, HPCA, OSDI, FAST, PACT and DAC. His research has been recognized with Boya Young Scholar, KAIST Breakthroughs 50th Innoversary, Nominated as Memorable Paper Award (NVMW), Best Representation Award (KCC) and over 60 news headlines. For more details, please visit his personal website: https://jiezhang-camel.github.io/.
HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
Amanda Raybuck (University of Texas at Austin);Tim Stamler (University of Texas at Austin);Wei Zhang (Microsoft);Mattan Erez (University of Texas at Austin);Simon Peter (University of Washington);
HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
Amanda Raybuck (University of Texas at Austin);Tim Stamler (University of Texas at Austin);Wei Zhang (Microsoft);Mattan Erez (University of Texas at Austin);Simon Peter (University of Washington);
Speaker: Amanda Raybuck, University of Texas at Austin
Abstract: High-capacity non-volatile memory (NVM) is a new main memory tier. Tiered DRAM+NVM servers increase total memory capacity by up to 8X, but can diminish memory bandwidth by up to 7X and inflate latency by up to 63% if not managed well. We study existing hardware and software tiered memory management systems on the recently available Intel Optane DC NVM with big data applications and find that no existing system maximizes application performance on real NVM. Based on our findings, we present HeMem, a tiered main memory management system designed from scratch for commercially available NVM and the big data applications that use it. HeMem manages tiered memory asynchronously, batching and amortizing memory access tracking, migration, and associated TLB synchronization overheads. HeMem monitors application memory use by sampling memory access via CPU events, rather than page tables. This allows HeMem to scale to terabytes of memory, keeping small and ephemeral data structures in fast memory, and allocating scarce, asymmetric NVM bandwidth according to access patterns. Finally, HeMem is flexible by placing per-application memory management policy at user-level. On a system with Intel Optane DC NVM, HeMem outperforms hardware, OS, and PL-based tiered memory management, providing up to 50% runtime reduction for the GAP graph processing benchmark, 13% higher throughput for TPC-C on the Silo in-memory database, 16% lower tail-latency under performance isolation for a key-value store, and up to 10X less NVM wear than the next best solution, without application modification.
Speaker bio: Amanda is a fifth year PhD student at UT Austin advised by Simon Peter. Her work focuses on modern memory management, especially with new hardware such as NVM. She has also worked on zero-copy and kernel-bypass systems.
LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism
Jongyul Kim (KAIST);Insu Jang (University of Michigan);Waleed Reda (KTH Royal Institute of Technology / Université catholique de Louvain);Jaeseong Im (KAIST);Marco Canini (KAUST);Dejan Kostić (KTH Royal Institute of Technology);Youngjin Kwon (KAIST);Simon Peter (University of Texas at Austin);Emmett Witchel (The University of Texas at Austin / Katana Graph);
LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism
Jongyul Kim (KAIST);Insu Jang (University of Michigan);Waleed Reda (KTH Royal Institute of Technology / Université catholique de Louvain);Jaeseong Im (KAIST);Marco Canini (KAUST);Dejan Kostić (KTH Royal Institute of Technology);Youngjin Kwon (KAIST);Simon Peter (University of Texas at Austin);Emmett Witchel (The University of Texas at Austin / Katana Graph);
Speaker: Jongyul Kim, KAIST
Abstract: In multi-tenant systems, the CPU overhead of distributed file systems (DFSes) is increasingly a burden to application performance. CPU and memory interference cause degraded and unstable application and storage performance, in particular for operation latency. Recent client-local DFSes for persistent memory (PM) accelerate this trend. DFS offload to SmartNICs is a promising solution to these problems, but it is challenging to fit the complex demands of a DFS onto simple SmartNIC processors located across PCIe. We present LineFS, a SmartNIC-offloaded, high-performance DFS with support for client-local PM. To fully leverage the SmartNIC architecture, we decompose DFS operations into execution stages that can be offloaded to a parallel data-path execution pipeline on the SmartNIC. LineFS offloads CPU-intensive DFS tasks, like replication, compression, data publication, index, and consistency management to a SmartNIC. We implement LineFS on the Mellanox BlueField SmartNIC and compare it to Assise, a state-of-the-art PM DFS. LineFS achieves 46% better I/O throughput while improving host application performance by up to 40% when host CPU resource is scarce.
Speaker bio: Jongyul Kim is a postdoc researcher in the Computer Architecture and Systems Lab at Korea Advanced Institute of Science and Technology (KAIST). He received a Ph.D. degree in the School of Computing from KAIST. His Ph.D. work received the best paper award at SOSP 2021. His research interest is in system software including operating systems, distributed file systems, smart devices, and virtualization.
GPM: Leveraging Persistent Memory from a GPU
Aditya K Kamath (University of Washington);Shweta Pandey (Indian Institute of Science-Bangalore);Arkaprava Basu (Indian Institute of Science-Bangalore);
GPM: Leveraging Persistent Memory from a GPU
Aditya K Kamath (University of Washington);Shweta Pandey (Indian Institute of Science-Bangalore);Arkaprava Basu (Indian Institute of Science-Bangalore);
Speaker: Aditya K Kamath, University of Washington
Abstract: The GPU is a key computing platform for many application domains. While the new non-volatile memory technology has brought the promise of byte-addressable persistence (a.k.a., persistent memory or PM) to CPU applications, the same, unfortunately, is beyond the reach of GPU programs. We take three key steps toward enabling GPU programs to access PM directly. First, enable direct access to PM from within a GPU kernel without needing to modify the hardware. Next, we demonstrate three classes of GPU-accelerated applications that benefit from PM. In the process, we create a workload suite with nine such applications. We then create a GPU library, written in CUDA, to support logging, checkpointing, and primitives for native persistence for programmers to easily leverage PM.
Speaker bio: Aditya K Kamath is a Ph.D. student at the University of Washington’s Paul G. Allen School of Computer Science and Engineering co-advised by Professor Mark Oskin and Professor Michael Taylor. He enjoys building high-performance software tailored to efficiently utilize underlying architectures and systems. Prior to this, he spent two years as a research assistant at the Indian Institute of Science, working alongside Professor Arkaprava Basu.
11:15 am-12:30 pm
11:15 am-12:30 pm
Session 3A: Reliability & Simulation for Emerging Technology
Chair: Paul Siegel, UC San Diego
MD-HM: Memoization-based Molecular Dynamics Simulations on Big Memory System
Zhen Xie (University of California, Merced);Dong Li (University of California, Merced);
MD-HM: Memoization-based Molecular Dynamics Simulations on Big Memory System
Zhen Xie (University of California, Merced);Dong Li (University of California, Merced);
Speaker: Zhen Xie, University of California, Merced
Abstract: Molecular dynamics (MD) simulation is a fundamental method for modeling ensembles of particles. In this paper, we introduce a new method to improve the performance of MD by leveraging the emerging TB-scale big memory system. In particular, we trade memory capacity for computation capability to improve MD performance by the lookup table-based memoization technique. The traditional memoization technique for the MD simulation uses relatively small DRAM, bases on a suboptimal data structure, and replaces pair-wise computation, which leads to limited performance benefit in the big memory system. We introduce MD-HM, a memoization-based MD simulation framework customized for the big memory system. MD-HM partitions the simulation field into subgrids, and replaces computation in each subgrid as a whole based on a lightweight pattern-match algorithm to recognize computation in the subgrid. MD-HM uses a new two-phase LSM-tree to optimize read/write performance. Evaluating with nine MD simulations, we show that MD-HM outperforms the state-of-the-art LAMMPS simulation framework with an average speedup of $7.6\times$ based on the Intel Optane-based big memory system.
Speaker bio: Zhen Xie is a research associate in the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory. Zhen was a postdoctoral researcher in the Department of Electrical Engineering and Computer Science at the University of California, Merced. His research interests include parallel algorithms and performance optimization, with a recent focus on high-performance computing for machine learning. Zhen has a Ph.D. in computer science from the Chinese Academy of Sciences in 2019.
Coding on Barrier Channels beyond Guaranteed Correction
Yuval Ben-Hur (Technion);Yuval Cassuto (Technion);
Coding on Barrier Channels beyond Guaranteed Correction
Yuval Ben-Hur (Technion);Yuval Cassuto (Technion);
Speaker: Yuval Ben-Hur, Technion - Israel Institute of Technology
Abstract: This paper studies coding on channels with the barrier property: only errors to and from a special barrier state are possible. Our contributions include derivation of the channel capacity, efficient maximum-likelihood (ML) and list decoding algorithms, and finite-block-length analysis using random codes. Emerging non-volatile memory technologies may exhibit controlled unreliability as their representation power is increased, and thus may benefit from the high capacity and improved coding of barrier channels.
Speaker bio: Yuval Ben-Hur received his B.Sc. and M.Sc. degrees in Electrical Engineering from the Technion - Israel institute of Technology, in 2014 and 2018, respectively. Since 2013 he works as an algorithms engineer and team leader in the defense industry. He is currently pursuing his Ph.D. degree in the Andrew and Erna Viterbi Electrical Engineering Department at the Technion. His main research interests include detection and coding schemes emerging data-storage and computation technologies.
Circadian Rhythm: A Candidate for Achieving Everlasting Flash Memories
Muhammed Ceylan Morgul (University of Virginia);Xinfei Guo (Shanghai Jiao Tong University);Mircea Stan (University of Virginia);
Circadian Rhythm: A Candidate for Achieving Everlasting Flash Memories
Muhammed Ceylan Morgul (University of Virginia);Xinfei Guo (Shanghai Jiao Tong University);Mircea Stan (University of Virginia);
Speaker: M Ceylan Morgul, University of Virginia
Abstract: The existing passive (resting) and the accelerated passive (thermal annealing) self-healing techniques were presented for flash memory's low endurance limitation. Yet, they have been utilized at the end (or near the end) of the lifetime of flash. This approach has left the permanent component of the damages unchecked since they can only recover temporary damage. If not recovered timely, the damages accumulate and become permanent. In this study, we propose implementing a Circadian Rhythm (CR) (as an analogue of nature) recovery technique to target the prevention of permanent damages. Our measurement results show that the most frequent rhythm, compared to the least frequent rhythm, slows down the speed of occurrences of the Byte Error Rate by around 50 times. Moreover, it shows a more flat and linear error occurrence trend since the CR technique prevents most of the permanent damages. The observed behavior in flash chips opens the opportunity of having everlasting flash memories by implementing Circadian Rhythm into Flash Transition Layer (FTL) or Flash File System (FFS).
Speaker bio: Muhammed Ceylan Morgul received his BSc degree in Electronics and Communication Engineering in 2014, and MSc degree in Electronics Engineering in 2017 at Istanbul Technical University. He is currently a Ph.D. student in Electrical Engineering at the University of Virginia. He has been the principal investigator of one TUBITAK, and researcher of EU-H2020-RISE, SRC-JUMP, and TUBITAK projects, in Turkey, the USA, France, Portugal, and Malaysia. He is the author of more than 10 peer-reviewed research papers. His current research interests include the reliability of memory technologies, processing in memory, and emerging computing.
On the Capacity of DNA-based Data Storage under Substitution Errors
Andreas Lenz (Technical University of Munich);Paul Siegel (UCSD);Antonia Wachter-Zeh (Technical University of Munich);Eitan Yaakobi (Technion - Israel Institute of Technology);
On the Capacity of DNA-based Data Storage under Substitution Errors
Andreas Lenz (Technical University of Munich);Paul Siegel (UCSD);Antonia Wachter-Zeh (Technical University of Munich);Eitan Yaakobi (Technion - Israel Institute of Technology);
Speaker: Paul H. Siegel, University of California, San Diego
Abstract: Advances in biochemical technologies, such as synthesizing and sequencing devices, have fueled many recent experiments on archival digital data storage using DNA. In this paper we study the information-theoretic capacity of such storage systems. The channel model incorporates the main properties of DNA-based data storage. We present the capacity of this channel for the case of substitution errors inside the sequences and provide an intuitive interpretation of the capacity formula for relevant channel parameters. We compare the capacity to rates achievable with a sub-optimal decoding method and conclude with a discussion on cost-efficient DNA archive design.
Speaker bio: Paul Siegel is a Distinguished Professor of Electrical and Computer Engineering in the Jacobs School of Engineering at the University of California, San Diego. His interests are in information theory and coding with applications to data storage and transmission. He holds an endowed chair in the Center for Memory and Recording Research, where he served as Director from 2000 to 2011. He has been on the organizing committee of the Non-Volatile Memories Workshop since its inception in 2010.
Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy
Jie Ren (University of California, Merced);Jiaolin Luo (University of California, Merced);Ivy Peng (Lawrence Livermore National Laboratory);Kai Wu (University of California, Merced);Dong Li (University of California, Merced);
Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy
Jie Ren (University of California, Merced);Jiaolin Luo (University of California, Merced);Ivy Peng (Lawrence Livermore National Laboratory);Kai Wu (University of California, Merced);Dong Li (University of California, Merced);
Speaker: Jie Ren, University of California, Merced
Abstract: Particle simulations of plasma are important for understanding plasma dynamics in space weather and fusion devices. However, production simulations that use billions and even trillions of computational particles require high memory capacity. In this work, we explore the latest persistent memory (PM) hardware to enable large-scale plasma simulations at unprecedented scales on a single machine. We use WarpX, an advanced plasma simulation code which is mission-critical and targets future exascale systems. We analyze the performance of WarpX on PM-based heterogeneous memory systems and propose to make the best use of memory hierarchy to avoid the impact of inferior performance of PM. We introduce a combination of static and dynamic data placement, and processor-cache prefetch mechanism for performance optimization. We develop a performance model to enable efficient data migration between PM and DRAM in the background, without reducing available bandwidth and parallelism to the application threads. We also build an analytical model to decide when to prefetch for the best use of caches. Our design achieves 66.4% performance improvement over the PM-only baseline and outperforms DRAM-cached, NUMA first-touch, and a state-of-the-art software solution by 38.8%, 45.1% and 83.3%, respectively
Session 3B: Systems using Persistent Memory
Chair: Steven Swanson, UC San Diego
Filesystem Encryption or Direct-Access for NVM Filesystems? Let’s Have Both!
Kazi Abu Zubair (North Carolina State University);David Mohaisen (University of Central Florida);Amro Awad (North Carolina State University);
Filesystem Encryption or Direct-Access for NVM Filesystems? Let’s Have Both!
Kazi Abu Zubair (North Carolina State University);David Mohaisen (University of Central Florida);Amro Awad (North Carolina State University);
Speaker: Kazi Abu Zubair, North Carolina State University
Abstract: Emerging Non-Volatile Memories (NVM) have access latency that is comparable to DRAM. Additionally, they can also store data persistently throughout system crashes. This enables merging the concept of main memory and storage into a single concept. This can be utilized to have a byte-addressable persistent memory where filesystems can be accessed using direct load/store operation. While this enables huge performance benefits over traditional block storage for the filesystem, it makes filesystem encryption challenging to implement. In this work, we propose a hardware-software co-design that can maintain filesystem security while maintaining direct access to NVM.
Speaker bio: Kazi is a final year Ph.D. student majoring in Computer Engineering at NC State. He is currently working under the supervision of Prof. Amro Awad within the Secure and Advanced Computer Architecture (SACA) research group. His research interest includes secure memory architecture, NVM security, and memory reliability. He received his BS degree from the University of Chittagong, Bangladesh, and worked in the R&D of several startup companies in Bangladesh before joining SACA.
PMNet: In-Network Data Persistence
Korakit Seemakhupt (University of Virginia);Sihang Liu (University of Virginia);Yasas Senevirathne (University of Virginia);Muhammad Shahbaz (Purdue University);Samira Khan (University of Virginia);
PMNet: In-Network Data Persistence
Korakit Seemakhupt (University of Virginia);Sihang Liu (University of Virginia);Yasas Senevirathne (University of Virginia);Muhammad Shahbaz (Purdue University);Samira Khan (University of Virginia);
Speaker: Korakit Seemakhupt, University of Virginia
Abstract: The recent adoption of fast storage systems (such as persistent memory) reduces latency of local data accesses. Yet, the latency between application processes and storage backends, which are typically spread across remote servers, remains prohibitive. In-network computing systems, today, can mitigate this remote-access latency, but only for the (stateless) read requests---by computing them within a network device. The requests that update to persistent state must still traverse the server. Realizing such characteristics, we introduce the idea of in-network data persistence and a PMNet system that persists data within the network devices; hence, moving the server off the critical path of update requests.
Speaker bio: Korakit Seemakhupt a fourth-year Ph.D. student in the Department of Computer Science at the University of Virginia. His research focuses on computer network, storage system and real system prototyping of emerging technologies.
Efficient Resumable Filter Queries
Pierre Sutra (Télécom SudParis);Muktikanta Sa (Micron Technologies);
Efficient Resumable Filter Queries
Pierre Sutra (Télécom SudParis);Muktikanta Sa (Micron Technologies);
Speaker: Muktikanta Sa, Micron Technologies, Hyderabad, India
Abstract: Non-volatile main memory (NVMM) is revolutionizing data storage. NVMM-ready data stores access directly persistent data without a volatile cache. To this end, they use persistent data types (PDTs) specifically designed to leverage NVMM. A PDT guarantees that the progress of an update operation is kept despite failures. However, there is no such guarantee for a read-only operations and progress is simply lost after a restart. To remedy such a problem, this paper proposes the notion of resumable operation. We present implementations of common PDTs (set, linked list, skip list and hash table) that support resumable filter queries. Preliminary results assess the benefit of persisting progress for long-running tasks.
Speaker bio: Muktikanta Sa obtained his Ph.D. in computer science and engineering from the Indian Institute of Technology Hyderabad (India). He is currently a System Performace Engineer at Micron Technology, Hyderabad, India. Before that, he was a PostDoc researcher in the computer science department of Télécom SudParis. He is interested in the fields of computer architecture, distributed systems, and non-volatile memory.
Persistent Scripting
Zi Fan Tan (San Jose State University);Jianan Li (Northeastern University);Terence Kelly (none);Haris Volos (University of Cyprus);
Persistent Scripting
Zi Fan Tan (San Jose State University);Jianan Li (Northeastern University);Terence Kelly (none);Haris Volos (University of Cyprus);
Speaker: Terence Kelly,
Abstract: Persistent scripting brings the benefits of persistent memory programming to high-level interpreted languages. More importantly, it brings the convenience and programmer productivity of scripting to persistent memory programming. We have integrated a novel generic persistent memory allocator into a popular scripting language interpreter, which now exposes a simple and intuitive persistence interface: A flag notifies the interpreter that a script's variables reside in a persistent heap in a specified file. The interpreter begins script execution with all variables in the persistent heap ready for immediate use. New variables defined by the running script are allocated on the persistent heap and are thus available to subsequent executions. Scripts themselves are unmodified and persistent heaps may be shared freely between unrelated scripts. Experiments show that our persistent gawk prototype offers good performance while simplifying scripts, and we identify opportunities to reduce interpreter overheads.
Slow is Fast: Rethinking In-Memory Graph Analysis with Persistent Memory
Hanyeoreum Bae (KAIST);Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sanghyun Han (KAIST);Sungjoon Koh (KAIST);Changrim Lee (KAIST);Dongchul Park (Sookmyung Women's University);Myoungsoo Jung (KAIST);
Slow is Fast: Rethinking In-Memory Graph Analysis with Persistent Memory
Hanyeoreum Bae (KAIST);Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sanghyun Han (KAIST);Sungjoon Koh (KAIST);Changrim Lee (KAIST);Dongchul Park (Sookmyung Women's University);Myoungsoo Jung (KAIST);
Speaker: Hanyeoreum Bae, KAIST
Abstract: In this paper, we explore and uncover the challenges that in-memory graph processing suffers from. Our system-level analysis includes empirical results opposite to the existing expectations of graph application users. Specifically, since raw graph data are not the same as the in-memory graph data, processing a billion-scale graph easily exhausts all system resources and makes the target system unavailable due to out-of-memory at runtime. To address this lack of memory space problem, we configure real persistent memory devices (PMEMs) with different operation modes and system software. We then introduce PMEM to a representative in-memory graph system, Ligra, and reveal the performance behaviors of different PMEM-applied in-memory graph systems. Based on our observations, we modify Ligra to improve the performance with a solid level of data persistence. Our evaluation results reveal that our modified Ligra exhibits 4.41× and 3.01× better performance than the original Ligra running on a virtual memory expansion and conventional persistent memory, respectively.
Speaker bio: Hanyeoreum Bae is a PhD student in KAIST, and his advisor is prof. Myoungsoo Jung.
TUESDAY, MAY 17
8:00 am – 9:00 am
Session 4: Memorable Paper Award Finalists II
Chair: Eitan Yakoobi, Technion
DNA-Storalator: End-to-End DNA Storage Simulator
Gadi Chaykin (Technion - Israel Institute of Technology);Nili Furman (Technion - Israel Institute of Technology);Omer Sabary (University of California San Diego);Dvir Ben Shabat (Technion - Israel Institute of Technology);Eitan Yaakobi (Technion - Israel Institute of Technology);
DNA-Storalator: End-to-End DNA Storage Simulator
Gadi Chaykin (Technion - Israel Institute of Technology);Nili Furman (Technion - Israel Institute of Technology);Omer Sabary (University of California San Diego);Dvir Ben Shabat (Technion - Israel Institute of Technology);Eitan Yaakobi (Technion - Israel Institute of Technology);
Speaker: Omer Sabary, Technion - Israel Institute of Technology
Abstract: DNA-Storalator is a cross-platform software tool that simulates the complete process of encoding, storing, and decoding digital data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of the storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on a comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. DNA- Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimate of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques. This end-to-end DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.
Speaker bio: Omer Sabary is a PhD student at the Computer Science Faculty at the Technions. His advisor is Prof. Eitan Yaakobi and his research interests include coding techniques and algorithms for DNA storage systems. In 2020 he received his M.Sc. from the Computer Science department at the Technion.
RACER: Bit-Pipelined Processing Using Resistive Memory
Minh S. Q. Truong (Carnegie Mellon University);Eric Chen (Carnegie Mellon University);Deanyone Su (Carnegie Mellon University);Alex Glass (Carnegie Mellon University);Liting Shen (Carnegie Mellon University);L. Richard Carley (Carnegie Mellon University);James A. Bain (Carnegie Mellon University);Saugata Ghose (University of Illinois Urbana-Champaign);
RACER: Bit-Pipelined Processing Using Resistive Memory
Minh S. Q. Truong (Carnegie Mellon University);Eric Chen (Carnegie Mellon University);Deanyone Su (Carnegie Mellon University);Alex Glass (Carnegie Mellon University);Liting Shen (Carnegie Mellon University);L. Richard Carley (Carnegie Mellon University);James A. Bain (Carnegie Mellon University);Saugata Ghose (University of Illinois Urbana-Champaign);
Speaker: Minh S. Q. Truong, Carnegie Mellon University
Abstract: To combat the high energy costs of moving data between main memory and the CPU, recent works have proposed to perform \emph{processing-using-memory} (PUM), a type of processing-in-memory where operations are performed on data \emph{in situ} (i.e., right at the memory cells holding the data). Several common and emerging memory technologies offer the ability to perform bitwise Boolean primitive functions by having interconnected cells interact with each other, eliminating the need to use discrete CMOS compute units for several common operations. Recent PUM architectures extend upon these Boolean primitives to perform bit-serial computation using memory. Unfortunately, several practical limitations of the underlying memory devices restrict how large emerging memory arrays can be, which hinders the ability of conventional bit-serial computation approaches to deliver high performance in addition to large energy savings. In this paper, we propose RACER, a cost-effective PUM architecture that delivers high performance and large energy savings using small arrays of resistive memories. RACER makes use of a \emph{bit-pipelining} execution model, which can pipeline bit-serial $w$-bit computation across $w$ small tiles. We fully design efficient control and peripheral circuitry, whose area can be amortized over small memory tiles without sacrificing memory density, and we propose an ISA abstraction for RACER to allow for easy program/compiler integration. We evaluate an implementation of RACER using NOR-capable ReRAM cells across a range of microbenchmarks extracted from data-intensive applications, and find that RACER provides 107$\times$, 12$\times$, and 7$\times$ the performance of a 16-core CPU, a 2304-shader-core GPU, and a state-of-the-art in-SRAM compute substrate, respectively, with energy savings of 189$\times$, 17$\times$, and 1.3$\times$.
Speaker bio: Minh S. Q. Truong is a Ph.D. student in the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received the Apple Ph.D. Fellowship in Integrated Systems in 2021 for his research in processing using resistive memory. He received dual B.S. degrees in electrical engineering and in computer engineering from the University of California, Davis in 2019. His current Ph.D. research seeks to create new classes of computer systems based on the processing-in-memory paradigm to reduce the power consumption of data-intensive applications by orders of magnitudes, and to enable efficient edge and cloud computing. His general research interest lies at the intersection of computer systems, microarchitecture, circuits, and how to design a holistic computer system.
Concentrated Stopping Set Design for Coded Merkle Tree: Improving Security Against Data Availability Attacks in Blockchain Systems
Debarnab Mitra (University of California, Los Angeles);Lev Tauz (University of California, Los Angeles);Lara Dolecek (University of California, Los Angeles);
Concentrated Stopping Set Design for Coded Merkle Tree: Improving Security Against Data Availability Attacks in Blockchain Systems
Debarnab Mitra (University of California, Los Angeles);Lev Tauz (University of California, Los Angeles);Lara Dolecek (University of California, Los Angeles);
Speaker: Debarnab Mitra, University of California, Los Angeles
Abstract: In certain blockchain systems, light nodes are clients that download only a small portion of the block. Light nodes are vulnerable to a data availability (DA) attack where a malicious node makes the light nodes accept an invalid block by hiding the invalid portion of the block from the nodes in the system. A technique based on LDPC codes called Coded Merkle Tree (CMT), proposed by Yu et al., enables light nodes to detect a DA attack by randomly requesting/sampling portions of the block from the malicious node. However, light nodes fail to detect a DA attack with high probability if a malicious node hides a small stopping set of the LDPC code. To improve the probability of detection, in this work, we demonstrate a specialized LDPC code design that focuses on concentrating stopping sets to a small group of variable nodes rather than only eliminating stopping sets. Our design demonstrates a higher probability of detecting DA attacks compared to prior work thus improving the security of the system.
Speaker bio: Debarnab Mitra is a Ph.D. candidate in the Department of Electrical and Computer Engineering at UCLA. He earned his M.S. degree from the ECE Department at UCLA in 2020, for which he was awarded the Outstanding MS Thesis Award in Signals and Systems. Prior to that, he graduated from IIT Bombay with a B. Tech. (Hons.) in Electrical Engineering and a minor in Computer Science and Engineering in 2018. His research interests include information theory, channel coding, and its applications to blockchains and non-volatile memories.
9:30 am-10:45 am
9:30 am-10:45 am
Session 5A: Architectures for Persistent Memory
Chair: Hung-Wei Tseng, UC Riverside
ASAP: A Speculative Approach to Persistence
Sujay Yadalam (University of Wisconsin-Madison);Michael Swift (University of Wisconsin-Madison);
ASAP: A Speculative Approach to Persistence
Sujay Yadalam (University of Wisconsin-Madison);Michael Swift (University of Wisconsin-Madison);
Speaker: Sujay Yadalam, University of Wisconsin-Madison
Abstract: Persistent memory enables a new class of applications that have persistent in-memory data structures. Recoverability of these applications imposes constraints on the ordering of writes to persistent memory. But, the cache hierarchy and memory controllers in modern systems may reorder writes to persistent memory. Therefore, programmers have to use expensive flush and fence instructions that stall the processor to enforce such ordering. While prior efforts circumvent stalling on long latency flush instructions, these designs under-perform in large-scale systems with many cores and multiple memory controllers. We propose ASAP, an architectural model in which the hardware takes an optimistic approach by persisting data eagerly, thereby avoiding any ordering stalls and utilizing the total system bandwidth efficiently. ASAP avoids stalling by allowing writes to be persisted out-of-order, speculating that all writes will eventually be persisted. For correctness, ASAP saves recovery information in the memory controllers which is used to undo the effects of speculative writes to memory in the event of a crash. Over a large number of representative workloads, ASAP improves performance over current Intel systems by 2.3x on average and performs within 3.9% of an ideal system.
Speaker bio: Sujay is a third year PhD student at the University of Wisconsin-Madison working with Prof. Michael Swift. His research interests include computer architecture and systems broadly. In the past few years, he has been working on building faster and secure interfaces to upcoming memory devices including NVM and SSDs.
ReplayCache: Enabling Volatile Caches for Energy Harvesting Systems
Jianping Zeng (Purdue University);Jongouk Choi (Purdue University);Xinwei Fu (Virginia Tech);Ajay Paddayuru Shreepathi (Stony Brook University);Dongyoon Lee (Stony Brook University);Changwoo Min (Virginia Tech);Changhee Jung (Purdue University);
ReplayCache: Enabling Volatile Caches for Energy Harvesting Systems
Jianping Zeng (Purdue University);Jongouk Choi (Purdue University);Xinwei Fu (Virginia Tech);Ajay Paddayuru Shreepathi (Stony Brook University);Dongyoon Lee (Stony Brook University);Changwoo Min (Virginia Tech);Changhee Jung (Purdue University);
Speaker: Jianping Zeng, Purdue University
Abstract: In this paper, we propose ReplayCache, a software-only crash consistency scheme that enables commodity energy harvesting systems to exploit a volatile data cache. ReplayCache does not have to ensure the persistence of dirty cache lines or record their logs at run time. Instead, ReplayCache recovery runtime re-executes the potentially unpersisted stores in the wake of power failure to restore the consistent NVM state, from which the interrupted program can safely resume. To support store replay during recovery, ReplayCache partitions the program into a series of regions in a way that store operand registers remain intact within each region, and checkpoints all registers just before power failure using the crash consistency mechanism of the commodity systems. The evaluation with 23 benchmark applications shows that compared to the baseline with no caches, ReplayCache can achieve about 10.72x and 8.5x-8.9x speedup (on geometric mean) for the scenarios without and with power outages, respectively
Speaker bio: He is a Ph.D. student in the department of computer science at Purdue University. His research interests focus on computer architecture and compiler optimizations.
IceClave: A Trusted Execution Environment for In-Storage Computing
Luyi Kang (University of Maryland, College Park);Yuqi Xue (University of Illinois at Urbana-Champaign);Weiwei Jia (University of Illinois at Urbana-Champaign);Xiaohao Wang (University of Illinois at Urbana-Champaign);Jongryool Kim (SK Hynix);Changhwan Youn (SK Hynix);Myeong Joon Kang (SK Hynix);Hyung Jin Lim (SK Hynix);Bruce Jacob (University of Maryland, College Park);Jian Huang (University of Illinois at Urbana-Champaign);
IceClave: A Trusted Execution Environment for In-Storage Computing
Luyi Kang (University of Maryland, College Park);Yuqi Xue (University of Illinois at Urbana-Champaign);Weiwei Jia (University of Illinois at Urbana-Champaign);Xiaohao Wang (University of Illinois at Urbana-Champaign);Jongryool Kim (SK Hynix);Changhwan Youn (SK Hynix);Myeong Joon Kang (SK Hynix);Hyung Jin Lim (SK Hynix);Bruce Jacob (University of Maryland, College Park);Jian Huang (University of Illinois at Urbana-Champaign);
Speaker: Yuqi Xue, University of Illinois at Urbana-Champaign
Abstract: In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviating the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them treat the in-storage security as the first citizen. Specifically, since modern SSD controllers do not have a trusted execution environment, an offloaded (malicious) program could steal, modify, and even destroy the data stored in the SSD. In this paper, we first investigate the attacks that could be conducted by offloaded in-storage programs. To defend against these attacks, we build a lightweight trusted execution environment, named IceClave for in-storage computing. IceClave enables security isolation between in-storage programs and flash management functions. IceClave also achieves security isolation between in-storage programs and enforces memory encryption and integrity verification of in-storage DRAM with low overhead. To protect data loaded from flash chips, IceClave develops a lightweight data encryption/decryption mechanism in flash controllers. We develop IceClave with a full system simulator. We evaluate IceClave with a variety of data-intensive applications such as databases. Compared to state-of-the-art in-storage computing approaches, IceClave introduces only 7.6\% performance overhead, while enforcing security isolation in the SSD controller with minimal hardware cost. IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.31$\times$ better performance than the conventional host-based trusted computing approach.
Speaker bio: Yuqi Xue is a first-year PhD student studying Electrical and Computer Engineering at University of Illinois at Urbana-Champaign. He is interested in computer architecture and system research with a focus on accelerator-centric system architecture.
A Write-Friendly NVM Scheme for Security Metadata with High Availability
Jianming Huang (Huazhong University of Science and Technology);Yu Hua (Huazhong University of Science and Technology);
A Write-Friendly NVM Scheme for Security Metadata with High Availability
Jianming Huang (Huazhong University of Science and Technology);Yu Hua (Huazhong University of Science and Technology);
Speaker: Jianming Huang, Huazhong University of Science and Technology
Abstract: Non-Volatile Memories (NVMs) require security mechanisms, e.g., counter mode encryption and integrity tree verification, which are important to protect systems in terms of encryption and data integrity. These security mechanisms heavily rely on extra security metadata that need to be efficiently and accurately recovered after system crashes or power off. Established SGX-style integrity tree (SIT) becomes efficient to protect system integrity and however fails to be restored from leaves, since the computations of SIT nodes need their parent nodes as inputs. To recover the security metadata with low write overhead and short recovery time, we propose an efficient and instantaneous persistence scheme, called STAR, which instantly persists the modifications of security metadata without extra memory writes. STAR is motivated by our observation that the parent nodes in cache are modified due to persisting their child nodes. STAR stores the modifications of parent nodes in their child nodes and persists them just using one atomic memory write. To eliminate the overhead of persisting the modifications, STAR coalesces the modifications and MACs in the evicted metadata. For fast recovery and verification of the metadata, STAR uses bitmap lines to indicate the locations of stale metadata, and constructs a cached merkle tree to verify the correctness of the recovery process. Our evaluation results show that compared with state-of-the-art work, our proposed STAR delivers high performance, low write traffic, low energy consumption and short recovery time.
Speaker bio: Jianming Huang is a Ph.D student in Huazhong University of Science and Technology (HUST) advised by Prof. Yu Hua. He obtained the B.E degree in Computer Science from HUST in 2018. His research interests include computer systems and memory architecture, including storage systems, security, and non-volatile memory systems.
Scaling Learned Indexes on Persistent Memory
Baotong Lu (The Chinese University of Hong Kong);Jialin Ding (Massachusetts Institute of Technology);Eric Lo (The Chinese University of Hong Kong);Umar Farooq Minhas (Microsoft Research);Tianzheng Wang (Simon Fraser University);
Scaling Learned Indexes on Persistent Memory
Baotong Lu (The Chinese University of Hong Kong);Jialin Ding (Massachusetts Institute of Technology);Eric Lo (The Chinese University of Hong Kong);Umar Farooq Minhas (Microsoft Research);Tianzheng Wang (Simon Fraser University);
Speaker: Baotong Lu, The Chinese University of Hong Kong
Abstract: The recently released persistent memory (PM) offers high performance, persistence, and is cheaper than DRAM. This opens up new possibilities for indexes that operate and persist data directly on the memory bus. Recent learned indexes exploit data distribution and have shown great potential for some workloads. However, none support persistence or instant recovery, and existing PM-based indexes typically evolve B+-trees without considering learned indexes. This paper proposes APEX, a new PM-optimized learned index that offers high performance, persistence, concurrency, and instant recovery. APEX is based on ALEX, a state-of-the-art updatable learned index, to combine and adapt the best of past PM optimizations and learned indexes, allowing it to reduce PM accesses while still exploiting machine learning. Our evaluation on Intel DCPMM shows that APEX can perform up to ~3.9x better than state-of-the-art PM indexes and can recover from failures in ~42ms.
Speaker bio: Baotong Lu is a Ph.D. candidate from the Department of Computer Science and Engineering, The Chinese University of Hong Kong. His research interest lies in database management systems, specifically next-generation database systems on persistent memory and multicore processor. He is a recipient of a 2021 ACM SIGMOD Research Highlight Award.
Session 5B: Data Structures for Persistent Memory
Chair: Steven Swanson, UC San Diego
UniHeap: Managing Persistent Objects Across Managed Runtimes for Non-Volatile Memory
Daixuan Li (UIUC);Benjamin Reidys (UIUC);Jinghan Sun (UIUC);Thomas Shull (UIUC);Josep Torrellas (UIUC);Jian Huang (UIUC);
UniHeap: Managing Persistent Objects Across Managed Runtimes for Non-Volatile Memory
Daixuan Li (UIUC);Benjamin Reidys (UIUC);Jinghan Sun (UIUC);Thomas Shull (UIUC);Josep Torrellas (UIUC);Jian Huang (UIUC);
Speaker: Daixuan Li, UIUC
Abstract: Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective ap- proach (i.e., managed NVM). However, such an approach is runtime specific, which lacks a generic abstraction across different man- aged languages. Similar to the well-known filesystem primitives that allow diverse programs to access same files via the block I/O interface, managed NVM deserves the same system-wide property for persistent objects across managed runtimes with low overhead. In this paper, we present UniHeap, a new NVM framework for managing persistent objects. It proposes a unified persistent ob- ject model that supports various managed languages, and manages NVM within a shared heap that enables cross-language persistent object sharing. UniHeap reduces the object persistence overhead by managing the shared heap in a log-structured manner and coa- lescing object updates during the garbage collection. We implement UniHeap as a generic framework and extend it to different managed runtimes that include HotSpot JVM, cPython, and JavaScript engine SpiderMonkey. We evaluate UniHeap with a variety of applications, such as key-value store and transactional database. Our evaluation shows that UniHeap significantly outperforms state-of-the-art ob- ject sharing approaches, while introducing negligible overhead to the managed runtimes
J-NVM: Off-heap Persistent Objects in Java
Anatole Lefort (Télécom SudParis - Institut Polytechnique de Paris);Yohan Pipereau (Télécom SudParis - Institut Polytechnique de Paris);Kwabena Amponsem Boateng (Télécom SudParis - Institut Polytechnique de Paris);Pierre Sutra (Télécom SudParis - Institut Polytechnique de Paris);Gaël Thomas (Télécom SudParis - Institut Polytechnique de Paris);
J-NVM: Off-heap Persistent Objects in Java
Anatole Lefort (Télécom SudParis - Institut Polytechnique de Paris);Yohan Pipereau (Télécom SudParis - Institut Polytechnique de Paris);Kwabena Amponsem Boateng (Télécom SudParis - Institut Polytechnique de Paris);Pierre Sutra (Télécom SudParis - Institut Polytechnique de Paris);Gaël Thomas (Télécom SudParis - Institut Polytechnique de Paris);
Speaker: Anatole Lefort, Télécom SudParis - Institut Polytechnique de Paris
Abstract: This paper presents J-NVM, a framework to access efficiently Non-Volatile Main Memory (NVMM) in Java. J-NVM offers a fully-fledged interface to persist plain Java objects using failure-atomic blocks. This interface relies internally on proxy objects that intermediate direct off-heap access to NVMM. The framework also provides a library of highly-optimized persistent data types that resist reboots and power failures. We evaluate J-NVM by implementing a persistent backend for the Infinispan data store. Our experimental results, obtained with a TPC-B like benchmark and YCSB, show that J-NVM is consistently faster than other approaches at accessing NVMM in Java.
Speaker bio: Anatole is a fourth-year and final year Ph.D. student at Télécom SudParis - Institut Polytechnique de Paris, advised by Pierre Sutra and Prof. Gaël Thomas from the computer science departement. His current research focuses on bringing persistent programming to cloud workloads and computations. Exploring langage support for PMEM and programming abstractions for distributed computing runtimes and frameworks.
Fast Nested Decoding for Short Generalized Integrated Interleaved BCH Codes
Xinmiao Zhang (The Ohio State University);
Fast Nested Decoding for Short Generalized Integrated Interleaved BCH Codes
Xinmiao Zhang (The Ohio State University);
Speaker: Xinmiao Zhang, The Ohio State University
Abstract: Generalized integrated interleaved (GII) codes that nest BCH sub-codewords to form more powerful BCH codewords are among the best error-correcting codes for storage class memories (SCMs). However, SCMs require short codeword length and low redundancy. In this case, miscorrections on the sub-words lead to severe error-correcting performance degradation if untreated. This abstract presents recent results on low-complexity miscorrection mitigation schemes for GII codes as well as their optimizations for latency reduction. For an example GII-BCH code with 3-error-correcting sub-codewords, the optimizations reduce the average nested decoding latency by 43% at input bit error rate 10^{-3} with very small performance loss.
Speaker bio: Xinmiao Zhang received her Ph.D. degree in Electrical Engineering from the University of Minnesota. She joined The Ohio State University as an Associate Professor in 2017. Prior to that, she was a Timothy E. and Allison L. Schroeder Assistant Professor 2005-2010 and Associate Professor 2010-2013 at Case Western Reserve University. Between her academic positions, she was a Senior Technologist at Western Digital/SanDisk Corporation. Dr. Zhang’s research spans the areas of VLSI architecture design, digital storage and communications, security, and signal processing. Dr. Zhang received an NSF CAREER Award in January 2009. She is also the recipient of the Best Paper Award at 2004 ACM Great Lakes Symposium on VLSI and 2016 International SanDisk Technology Conference. She authored the book “VLSI Architectures for Modern Error-Correcting Codes” (CRC Press, 2015) and published more than 100 papers. She was elected the Vice President-Technical Activities of the IEEE Circuits and Systems Society (CASS) for the 2022-2023 term. She also served on the Board of Governors of CASS 2019-2021. She is the Chair of the Data Storage Technical Committee (DSTC) of the IEEE Communications Society for the 2021-2022 term and was previously a Vice-Chair of DSTC 2017-2020. She is also a member of the CASCOM and VSA Technical Committees of IEEE. She served on the technical program and organization committees of many conferences, including ISCAS, SiPS, ICC, GLOBECOM, GlobalSIP, and GLSVLSI. She has been an Associate Editor for the IEEE Transactions on Circuits and Systems-I 2010-2019 and IEEE Open Journal of Circuits and Systems since 2019.
Fast Nonblocking Persistence for Concurrent Data Structures
Wentao Cai (University of Rochester);Haosen Wen (University of Rochester);Vladimir Maksimovski (University of Rochester);Mingzhe Du (University of Rochester);Rafaello Sanna (University of Rochester);Shreif Abdallah (University of Rochester);Michael Scott (University of Rochester);
Fast Nonblocking Persistence for Concurrent Data Structures
Wentao Cai (University of Rochester);Haosen Wen (University of Rochester);Vladimir Maksimovski (University of Rochester);Mingzhe Du (University of Rochester);Rafaello Sanna (University of Rochester);Shreif Abdallah (University of Rochester);Michael Scott (University of Rochester);
Speaker: Wentao Cai, University of Rochester
Abstract: We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is \emph{buffered durably linearizable}: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a `sync` operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free `sync` significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature---dramatically faster than achieved with prior general-purpose systems, and generally within 50\% of equivalent non-persistent structures placed in DRAM.
Speaker bio: Wentao Cai is a last-year PhD student at the University of Rochester. Working with Prof. Michael L. Scott, he is interested in concurrent (nonblocking in particular) data structures, persistent memory, transactions, and memory management.
Linear-Time Encoders for Two-Dimensional Bounded-Weight Constrained Codes
Tuan Thanh Nguyen (Singapore University of Technology and Design);Kui Cai (Singapore University of Technology and Design);Kees A. Schouhamer Immink (Turing Machines Inc.);Yeow Meng Chee (National University of Singapore);
Linear-Time Encoders for Two-Dimensional Bounded-Weight Constrained Codes
Tuan Thanh Nguyen (Singapore University of Technology and Design);Kui Cai (Singapore University of Technology and Design);Kees A. Schouhamer Immink (Turing Machines Inc.);Yeow Meng Chee (National University of Singapore);
Speaker: Tuan Thanh Nguyen, Singapore University of Technology and Design
Abstract: In this work, given n, p > 0, efficient encoding/decoding algorithms are presented for mapping arbitrary data to and from n × n binary arrays in which the weight of every row and every column is at most pn. Such constraint, referred to as p-bounded- weight-constraint, is crucial for reducing the parasitic currents in the crossbar resistive memory arrays, and has also been proposed for certain applications of the holographic data storage.
Speaker bio: I am currently a Research Fellow at Singapore University of Technology and Design (SUTD), with the Advanced Coding and Signal Processing (ACSP) Lab of SUTD. My research interest lies in the interplay between combinatorics and computer science/engineering, particularly including combinatorics and coding theory. My current research project concentrates on coding techniques for data storage systems, particularly including codes for DNA-Based data storage. I received the B.Sc. degree and the Ph.D. degree in mathematics from the Nanyang Technological University (NTU), Singapore, in 2014 and 2018, respectively. I was a Research Fellow at School of Physical and Mathematical Sciences (SPMS) at NTU, from Aug 2018 to Sep 2019.
11:00 am-12:15 pm
11:00 am-12:15 pm
Session 6A: Machine Learning & SSD Architectures
Chair: Oliver Hambrey, Siglead
Offline and Online Algorithms for SSD Management
Tomer Lange (Technion - Israel Institute of Technology);Gala Yadgar (Technion - Israel Institute of Technology);Joseph (Seffi) Naor (Technion - Israel Institute of Technology);
Offline and Online Algorithms for SSD Management
Tomer Lange (Technion - Israel Institute of Technology);Gala Yadgar (Technion - Israel Institute of Technology);Joseph (Seffi) Naor (Technion - Israel Institute of Technology);
Speaker: Tomer Lange, Technion - Israel Institute of Technology
Abstract: The abundance of system-level optimizations for reducing SSD write amplification, which are usually based on experimental evaluation, stands in contrast to the lack of theoretical algorithmic results in this problem domain. To bridge this gap, we explore the problem of reducing write amplification from an algorithmic perspective, considering it in both offline and online settings. In the offline setting, we present a near-optimal algorithm. In the online setting, we first consider algorithms that have no prior knowledge about the input. We present a worst case lower bound and show that the greedy algorithm is optimal in this setting. Then we design an online algorithm that uses predictions about the input. We show that when predictions are pretty accurate, our algorithm circumvents the above lower bound. We complement our theoretical findings with an empirical evaluation of our algorithms, comparing them with the state-of-the-art scheme. The results confirm that our algorithms exhibit an improved performance for a wide range of input traces.
Speaker bio: Tomer Lange is a Ph.D. candidate at the computer science department, Technion, Israel. His advisors are Prof. Seffi Naor and Prof. Gala Yadgar. His research interests include algorithms for memory management in storage systems.
RSSD: Defend against Ransomware with Hardware-Isolated Network-Storage Codesign and Post-Attack Analysis
Benjamin Reidys (UIUC);Peng Liu (The Pennsylvania State University);Jian Huang (UIUC);
RSSD: Defend against Ransomware with Hardware-Isolated Network-Storage Codesign and Post-Attack Analysis
Benjamin Reidys (UIUC);Peng Liu (The Pennsylvania State University);Jian Huang (UIUC);
Speaker: Benjamin Reidys, University of Illinois at Urbana-Champaign
Abstract: Encryption ransomware has become a notorious malware. It en- crypts user data on storage devices like solid-state drives (SSDs) and demands a ransom to restore data for users. To bypass existing defenses, ransomware would keep evolving and performing new attack models. For instance, we identify and validate three new attacks, including (1) garbage-collection (GC) attack that exploits storage capacity and keeps writing data to trigger GC and force SSDs to release the retained data; (2) timing attack that intentionally slows down the pace of encrypting data and hides its I/O patterns to escape existing defense; (3) trimming attack that utilizes the trim command available in SSDs to physically erase data. To enhance the robustness of SSDs against these attacks, we propose RSSD, a ransomware-aware SSD. It redesigns the flash management of SSDs for enabling the hardware-assisted logging, which can conservatively retain older versions of user data and received storage operations in time order with low overhead. It also employs hardware-isolated NVMe over Ethernet to expand local storage capacity by transparently offloading the logs to re- mote cloud/servers in a secure manner. RSSD enables post-attack analysis by building a trusted evidence chain of storage opera- tions to assist the investigation of ransomware attacks. We develop RSSD with a real-world SSD FPGA board. Our evaluation shows that RSSD can defend against new and future ransomware attacks, while introducing negligible performance overhead.
Speaker bio: Benjamin Reidys is a second-year Ph.D. student at the University of Illinois, Urbana-Champaign. His research interests include architecture and operating systems with an emphasis on network/storage codesign. Contact him at breidys2@illinois.edu
Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Model
Assaf Eisenman (Meta Platforms);Kiran Kumar Matam (Meta Platforms);Steven Ingram (Meta Platforms);Dheevatsa Mudigere (Meta Platforms);Raghuraman Krishnamoorth (Meta Platforms);Krishnakumar Nair (Meta Platforms);Misha Smelyanskiy (Meta Platforms);Murali Annavaram (University of Southern California);
Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Model
Assaf Eisenman (Meta Platforms);Kiran Kumar Matam (Meta Platforms);Steven Ingram (Meta Platforms);Dheevatsa Mudigere (Meta Platforms);Raghuraman Krishnamoorth (Meta Platforms);Krishnakumar Nair (Meta Platforms);Misha Smelyanskiy (Meta Platforms);Murali Annavaram (University of Southern California);
Speaker: Assaf Eisenman, Meta Platforms
Abstract: Checkpoints play an important role in training long running machine learning (ML) models. Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that they can be used to recover from failures to ensure rapid training progress. In addition, they are used for online training to improve inference prediction accuracy with continuous learning. Given the large and ever-increasing model sizes, checkpoint frequency is often bottlenecked by the storage write bandwidth and capacity. When checkpoints are maintained on remote storage, as is the case with many industrial settings, they are also bottlenecked by network bandwidth. We present Check-n-run, a scalable checkpointing system for training large ML models at Meta. While Check-n-run is applicable to long running ML jobs, we focus on checkpointing recommendation models which are currently the largest ML models with Terabytes of model size. Check-n-run uses two primary techniques to address the size and bandwidth challenges. First, it applies differential checkpointing, which tracks and checkpoints the modified part of the model. Differential checkpointing is particularly valuable in the context of recommendation models where only a fraction of the model (stored as embedding tables) is updated on each iteration. Second, Check-n-run leverages quantization techniques to significantly reduce the checkpoint size, without degrading training accuracy. These techniques allow Check-n-run to reduce the required write bandwidth by 6-17x and the required capacity by 2.5-8x on real-world models at Meta, and thereby significantly improve checkpoint capabilities while reducing the total cost of ownership.
Speaker bio: Assaf is a technical lead and manager at Meta Platforms, focusing on research and development of novel software systems for AI. He previously earned his PhD in Electrical Engineering at Stanford University, focusing on distributed software systems. Before that, he was a performance architect at Intel.
Rethinking the Performance/Cost of Persistent Memory and SSDs
Kaisong Huang (Simon Fraser University);Darien Imai (Simon Fraser University);Tianzheng Wang (Simon Fraser University);Dong Xie (The Pennsylvania State University);
Rethinking the Performance/Cost of Persistent Memory and SSDs
Kaisong Huang (Simon Fraser University);Darien Imai (Simon Fraser University);Tianzheng Wang (Simon Fraser University);Dong Xie (The Pennsylvania State University);
Speaker: Kaisong Huang, Simon Fraser University
Abstract: For decades, the storage hierarchy consisted of layers with distinct performance characteristics and costs: a higher level (in particular, memory) is assumed to be strictly faster, less capacious, volatile, and more expensive than a lower-level layer (e.g., SSDs and HDDs). This good ol' storage hierarchy, however, is becoming a jungle: On the one hand, persistent memory (PM) breaks the boundary between volatile and non-volatile storage with persistence on the memory bus. On the other hand, modern SSDs' high bandwidth directly rivals PM, breaking the strict hierarchy from the performance perspective. This naturally leads to simple, motivating questions: Could a well-tuned SSD-based data structure (e.g., index) match or outperform a well-tuned PM-tailored data structure under certain workloads? How does the real cost of a PM-based system stack up and compare to that of an SSD-based system? These advances and questions signal the need to revisit the performance/cost of persistent data structures. We take B+-trees and hash tables for an initial inquiry. Our goals are to (1) understand the relative merits of indexing on PM and SSD, (2) reason about the cost of PM- and SSD-based systems, and (3) highlight the implications of the storage jungle on future persistent indexes.
Speaker bio: Kaisong Huang is a 2nd-year PhD student in the School of Computing Science at Simon Fraser University advised by Tianzheng Wang. His research is mainly focused on database engines, transaction processing and storage management, in the context of modern storage technologies like NVMe SSDs and persistent memory.
Deep Learning based Prefetching for Flash
Chandranil Chakraborttii (Trinity College);Heiner Litz (University of California, Santa Cruz);
Deep Learning based Prefetching for Flash
Chandranil Chakraborttii (Trinity College);Heiner Litz (University of California, Santa Cruz);
Speaker: Chandranil Chakraborttii, Trinity College
Abstract: Prefetching in solid-state drives (SSDs) is a process of predicting future block accesses and loading them into the main memory ahead of time. In this paper, we describe the challenges of prefetching in SSDs, elaborate why prior approaches fail to achieve high accuracy, and present a deep neural network(DNN) based prefetching technique that significantly outperforms the state-of-the-art. We address the challenges of prefetching in very large sparse address ranges, as well as prefetching in a timely manner by predicting ahead of time. We show our proposed technique outperforms prior prefetching approaches based on Markov chains by up to 8X and the existing stride prefetchers by up to 800X and on real-world applications running on cloud servers.
Speaker bio: Chandranil “Nil” Chakraborttii works as an Assistant Professor of Computer Science at Trinity College. Nil received his Ph.D. and master’s degree in computer science from the University of California Santa Cruz and an undergraduate degree in information technology from West Bengal State University, India. He has collaborated with industry partners, Samsung and Intel on research projects related to cloud storage systems, and has jointly authored patents and publications. His research interests lie at the intersection of artificial intelligence and storage systems. More specifically, Chakraborttii is interested in the performance optimization of flash-based solid-state drives for cloud systems using machine learning techniques.
Session 6B: Software for Persistent Memory
Chair: Steven Swanson, UC San Diego
SoftPM: Software Persistent Memory
Yuanchao Xu (North Carolina State University & Google);Wei Xu (Google);Kimberly Keeton (Google);David E. Culler (Google);
SoftPM: Software Persistent Memory
Yuanchao Xu (North Carolina State University & Google);Wei Xu (Google);Kimberly Keeton (Google);David E. Culler (Google);
Speaker: Yuanchao Xu, North Carolina State University & Google
Abstract: Hardware persistent memory (HardPM) offers a promising alternative to DRAM, but the mass adoption necessary to realize its cost advantages remains elusive, especially without broad application demand. An alternative, long-understood approach to fast persistence is to utilize the battery-backed DRAM that is deployed in hyperscalar data centers. We present SoftPM, a Software Persistent Memory design that manages vulnerable DRAM-resident updates to ensure that data is persisted in the event of a power outage. SoftPM supports a user-directed mode by leveraging application persistency models (e.g., logging), a transparent mode that relies on kernel page fault support, and an explicit model that gives the application direct control over persistence. Our implementation significantly outperforms HardPM and hybrid HardPM-DRAM alternatives. By providing a general-purpose solution that leverages existing infrastructure, we hope to spur wider adoption of fast local persistence.
Speaker bio: Yuanchao Xu is a fourth-year Ph.D. candidate at NC State University and a student researcher in the System Research at Google. His research interests are architecture, security, and systems.
Yashme: Detecting Persistency Races
Hamed Gorjiara (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Yashme: Detecting Persistency Races
Hamed Gorjiara (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Speaker: Hamed Gorjiara, University of California, Irvine
Abstract: Persistent memory (PM) or Non-Volatile Random-Access Memory (NVRAM) hardware such as Intel’s Optane memory product promises to transform how programs store and manipulate information. Ensuring that persistent memory programs are crash-consistent is a major challenge. We present a novel class of crash consistency bugs for persistent memory programs, which we call persistency races. Persistency races can cause non-atomic stores to be made partially persistent. Persistency races arise due to the interaction of standard compiler optimizations with persistent memory semantics. We present Yashme, the first detector for persistency races. A major challenge is that in order to detect persistency races, the execution must crash in a very narrow window between a store with a persistency race and its corresponding cache flush operation, making it challenging for naïve techniques to be effective. Yashme overcomes this challenge with a novel technique for detecting races in executions that are prefixes of the pre-crash execution. This technique enables Yashme to effectively find persistency races even if the injected crashes do not fall into that window. We have evaluated Yashme on a range of persistent memory benchmarks and have found 26 real persistency races that have never been reported before.
Speaker bio: Hamed Gorjiara is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Irvine (UCI). He currently works at the Programming Languages Research Group advised by Brian Demsky. His research interests are software design, compilers, and testing frameworks. Mainly, his research focuses on developing efficient testing frameworks for persistent memory programs to facilitate the adoption of normal programs on this new type of memory.
PSan: Checking Robustness to Weak Persistency Models
Hamed Gorjiara (University of California, Irvine);Weiyu Luo (University of California, Irvine);Alex Lee (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
PSan: Checking Robustness to Weak Persistency Models
Hamed Gorjiara (University of California, Irvine);Weiyu Luo (University of California, Irvine);Alex Lee (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Speaker: Weiyu Luo, University of California, Irvine
Abstract: Persistent memory (PM) technologies offer performance close to DRAM with persistence. Persistent memory enables programs to directly modify persistent data through normal load and store instructions bypassing heavyweight OS system calls for persistency. However, these stores are not made immediately made persistent, the developer must manually flush the corresponding cache lines to force the data to be written to persistent memory. While state-of-the-art testing tools can help developers find and fix persistency bugs, prior studies have shown fixing persistency bugs on average takes a couple of weeks for PM developers. The developer has to manually inspect the execution to identify the root cause of the problem. In addition, most of the existing state-of-the-art testing tools require heavy user annotations to detect bugs without visible symptoms such as a segmentation fault. In this paper, we present robustness as a sufficient correctness condition to ensure that program executions arefree from missing flush bugs. We develop an algorithm for checking robustness and have implemented this algorithm in the PSan tool. PSan can help developers both identify silent data corruption bugs and localize bugs in large traces to the problematic memory operations that are missing flush operations. We have evaluated PSan on a set of concurrent indexes, persistent memory libraries, and two popular real-world applications. We found 48 bugs in these benchmarks that 17 of them were not reported before.
Speaker bio: Weiyu Luo is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Irvine. His research interests include memory model, concurrency, and persistent memory.
Carbide: A Safe Persistent Memory Multilingual Programming Framework
Morteza Hoseinzadeh (Oracle);Steve Swanson (University of California - San Diego);
Carbide: A Safe Persistent Memory Multilingual Programming Framework
Morteza Hoseinzadeh (Oracle);Steve Swanson (University of California - San Diego);
Speaker: Morteza Hoseinzadeh, Oracle
Abstract: Persistent main memories (PM) allow us to create complex, crash-resilient data structures that are directly accessible from the processors. PM programming is difficult, especially because it combines well-known programming issues such as locking, memory management, pointer safety, and new PM-related bug types. If a mistake occurs in any of these areas, it can corrupt data, leak resources, or fail recovery from system crashes. Many PM libraries in different languages make it easy to solve some of these problems, but the more flexible a programming language is, the more likely it is to make mistakes. For example, Corundum as a Rust-based PM library guarantees PM safety by imposing strict rules checked during compilation, while PMDK as a C++-based library does not. This paper presents Carbide, a compiling toolchain based on Corundum, Rust, and C++ which automatically ports safe generic persistent data structures implemented in Corundum from Rust into C++. Carbide lets programmers confidently develop PM-bug-free persistent data structures in Corundum and safely use them in C++. We have implemented Carbide and found its performance to be almost as good as other PM systems, including Corundum, PMDK, Atlas, Mnemosyne, and go-pmem while making strong safety guarantees and allowing flexible programming at the same time.
Speaker bio: I received my Ph.D. from UC San Diego in Fall 2021 and joined Oracle in November 2021. My Ph.D. program was under the supervision of Professor Steven Swanson in the Non-Volatile Systems Lab (NVSL) of the Department of Computer Science and Engineering (CSE). I had been working on building fast, smart storage systems, and fast non-volatile memories which can act like DRAM, and presumably will be able to directly attached to a processor's memory bus. We observed that the implementation of crash-consistent data structures needs extra care to debug. This led us to my Ph.D. thesis research which aims to enforce PMEM safety statically at compile time. We developed a new president memory library based, Corundum, which is written in Rust and statically enforces PMEM safety rules. Using that, programmers can make sure that their programs are provably correct in terms of PMEM-safety, without spending extra time learning the PMEM programming model and debugging their implementations.
Recoverable Software Combining
Panagiota Fatourou (FORTH ICS and University of Crete, Greece);Nikolaos Kallimanis (Foundation for Research & Technology - Hellas, Institute of Computer Science);Eleftherios Kosmas (University of Crete, Greece);
Recoverable Software Combining
Panagiota Fatourou (FORTH ICS and University of Crete, Greece);Nikolaos Kallimanis (Foundation for Research & Technology - Hellas, Institute of Computer Science);Eleftherios Kosmas (University of Crete, Greece);
Speaker: Eleftherios Kosmas, University of Crete, Greece
Abstract: The availability of non-volatile main memory (NVMM) enables the design of concurrent algorithms, whose execution will be recoverable at low cost. In this paper, we reveal the power of software combining (a state-of-the-art synchronization technique) in achieving recoverable synchronization and designing recoverable data structures. We present two recoverable software combining protocols, which have been designed carefully to respect a number of principles we identify to be crucial for performance. We also build recoverable queues and stacks using our protocols, which play a significant role in runtime systems, high performance computing, kernel schedulers, and network interfaces. Our experiments show that the proposed implementations outperform by far many existing recoverable synchronization techniques, as well as recoverable data structures based on them and specialized recoverable implementations of these data structures.
Speaker bio: Eleftherios Kosmas received his BSc and MSc degrees in Computer Science from the University of Ioannina in 2005 and 2008, and his PhD degree in Computer Science from the University of Crete in 2015. Currently, he is a Postdoc Researcher at the Department of Computer Science of University of Crete. His research interests include principles of parallel and distributed computing, and programming of parallel and distributed systems.