MONDAY, MAY 9
11:00 am – 12:50 pm | Atkinson Hall
Lunch
12:50 pm – 1:00 pm | Qualcomm Institute Auditorium
Opening Remarks
Chair: Steven Swanson, UC San Diego
1:00 pm – 2:15 pm | Qualcomm Institute Auditorium
Keynote I
Chair: Steven Swanson, UC San Diego
Next Horizon for Storage & Memory Silicon in the Datacenter
Gary Kotzur (VP/Storage CTO, Marvell);
Next Horizon for Storage & Memory Silicon in the Datacenter
Gary Kotzur (VP/Storage CTO, Marvell);
Speaker: Gary Kotzur, VP/Storage CTO, Marvell
Abstract: The introduction of disaggregated and composable infrastructure moved the industry to create a low-latency interface that supports memory semantics. While there are several initiatives, CXL has enormous industry momentum with products shipping in the near future. We currently have disaggregated storage solutions that are widely used in the industry. In this presentation, I will review new topologies for disaggregated storage made possible by NVMe-oF and Ethernet drives. In addition, I will show options for CXL topologies that will parallel those of disaggregated storage. In conclusion, I will end with a call to action for storage class memory.
Speaker bio: Gary Kotzur is Marvell’s Storage CTO, leading the Storage Organization and Architecture Team within Marvell’s Storage Product Group. Gary has over 25 years in the computer industry and was previously with Dell EMC prior to joining Marvell in 2020. Gary’s team is responsible for delivering the product architecture, technology vision and strategy for storage and memory products which includes controllers for HDDs, SSDs, accelerators, fibre channel and CXL devices. The Storage CTO team also directs emerging technology investigations, standards bodies and university technology engagements. Gary has a diverse background spanning semiconductor to system level design, with expertise in the areas of computer, storage and networking architecture. Additionally, he has initiated and been involved in a number of industry standards while being an active member on their boards.Over his career, Gary has been granted 65 patents for systems design that includes computer, networking, storage and ASIC architectures, and he has an additional 18 patent applications pending. Gary holds a bachelor’s degree in electrical engineering from Texas A&M University and a master’s degree in electrical engineering from The University of Houston.
2:45 pm – 3:45 pm | Qualcomm Institute Auditorium
Session 1: Memorable Paper Award Finalists I
Chair: Erich Haratsch, Marvell
HolisticGNN: Geometric Deep Learning Engines for Computational SSDs
Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sangwon Lee (KAIST);Myoungsoo Jung (KAIST);
HolisticGNN: Geometric Deep Learning Engines for Computational SSDs
Miryeong Kwon (KAIST);Donghyun Gouk (KAIST);Sangwon Lee (KAIST);Myoungsoo Jung (KAIST);
Speaker: Miryeong Kwon, KAIST
Abstract: Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges, which exhibit much higher accuracy in a variety of prediction tasks. However, as GNNs are engaged with a large set of graphs and embedding data on storage, they suffer from heavy I/O accesses and irregular computation. We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the data exist in a holistic manner. We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance GPU by 7.1x while reducing energy consumption by 33.2x, on average.
Speaker bio: Miryeong Kwon is a Ph.D. Candidate of Korea Advanced Institute of Science and Technology (KAIST). She is advised by Myongsoo Jung who leads the Computer Architecture, Non-volatile memory, and operating system. Her main research interest is hardware-software co-design for emerging applications and non-volatile and storage device management in that system.
RACER: Bit-Pipelined Processing Using Resistive Memory
Minh S. Q. Truong (Carnegie Mellon University);Eric Chen (Carnegie Mellon University);Deanyone Su (Carnegie Mellon University);Alex Glass (Carnegie Mellon University);Liting Shen (Carnegie Mellon University);L. Richard Carley (Carnegie Mellon University);James A. Bain (Carnegie Mellon University);Saugata Ghose (University of Illinois Urbana-Champaign);
RACER: Bit-Pipelined Processing Using Resistive Memory
Minh S. Q. Truong (Carnegie Mellon University);Eric Chen (Carnegie Mellon University);Deanyone Su (Carnegie Mellon University);Alex Glass (Carnegie Mellon University);Liting Shen (Carnegie Mellon University);L. Richard Carley (Carnegie Mellon University);James A. Bain (Carnegie Mellon University);Saugata Ghose (University of Illinois Urbana-Champaign);
Speaker: Minh S. Q. Truong, Carnegie Mellon University
Abstract: To combat the high energy costs of moving data between main memory and the CPU, recent works have proposed to perform \emph{processing-using-memory} (PUM), a type of processing-in-memory where operations are performed on data \emph{in situ} (i.e., right at the memory cells holding the data). Several common and emerging memory technologies offer the ability to perform bitwise Boolean primitive functions by having interconnected cells interact with each other, eliminating the need to use discrete CMOS compute units for several common operations. Recent PUM architectures extend upon these Boolean primitives to perform bit-serial computation using memory. Unfortunately, several practical limitations of the underlying memory devices restrict how large emerging memory arrays can be, which hinders the ability of conventional bit-serial computation approaches to deliver high performance in addition to large energy savings. In this paper, we propose RACER, a cost-effective PUM architecture that delivers high performance and large energy savings using small arrays of resistive memories. RACER makes use of a \emph{bit-pipelining} execution model, which can pipeline bit-serial $w$-bit computation across $w$ small tiles. We fully design efficient control and peripheral circuitry, whose area can be amortized over small memory tiles without sacrificing memory density, and we propose an ISA abstraction for RACER to allow for easy program/compiler integration. We evaluate an implementation of RACER using NOR-capable ReRAM cells across a range of microbenchmarks extracted from data-intensive applications, and find that RACER provides 107$\times$, 12$\times$, and 7$\times$ the performance of a 16-core CPU, a 2304-shader-core GPU, and a state-of-the-art in-SRAM compute substrate, respectively, with energy savings of 189$\times$, 17$\times$, and 1.3$\times$.
Speaker bio: Minh S. Q. Truong is a Ph.D. student in the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received the Apple Ph.D. Fellowship in Integrated Systems in 2021 for his research in processing using resistive memory. He received dual B.S. degrees in electrical engineering and in computer engineering from the University of California, Davis in 2019. His current Ph.D. research seeks to create new classes of computer systems based on the processing-in-memory paradigm to reduce the power consumption of data-intensive applications by orders of magnitudes, and to enable efficient edge and cloud computing. His general research interest lies at the intersection of computer systems, microarchitecture, circuits, and how to design a holistic computer system.
NDS: N-Dimensional Storage
Yu-Chia Liu (University of California, Riverside);Hung-Wei Tseng (University of California, Riverside);
NDS: N-Dimensional Storage
Yu-Chia Liu (University of California, Riverside);Hung-Wei Tseng (University of California, Riverside);
Speaker: Yu-Chia Liu, University of California, Riverside
Abstract: Demands for efficient computing among applications that use high-dimensional datasets have led to multi-dimensional computers—computers that leverage heterogeneous processors/accelerators offering various processing models to support multi-dimensional compute kernels. Yet the front-end for these processors/accelerators is inefficient, as memory/storage systems often expose only entrenched linear-space abstractions to an application, and they often ignore the benefits of modern memory/storage systems, such as support for multi-dimensionality through different types of parallel access. This paper presents N-Dimensional Storage (NDS), a novel, multi-dimensional memory/storage system that fulfills the demands of modern hardware accelerators and applications. NDS abstracts memory arrays as native storage that applications can use to describe data locations and uses coordinates in any application-defined multi-dimensional space, thereby avoiding the software overhead associated with data-object transformations. NDS gauges the application demand under- lying memory-device architectures in order to intelligently determine the physical data layout that maximizes access bandwidth and minimizes the overhead of presenting objects for arbitrary applications. This paper demonstrates an efficient architecture in supporting NDS. We evaluate a set of linear/tensor algebra workloads along with graph and data-mining algorithms on custom-built systems using each architecture. Our result shows a 5.73× speedup with appropriate architectural support.
Speaker bio: Yu-Chia Liu is currently a 4th-year Ph.D. student advised by Hung-Wei Tseng at UC Riverside. His research focuses on the interaction between hardware-accelerated programs and storage systems. Yu-Chia's most recent paper, N-Dimensional Storage, is the best paper candidate for MICRO 2021. Currently, Yu-Chia is on the job market seeking a position in the industry.
4:15 pm-5:35 pm | Qualcomm Institute Auditorium
4:15 pm-5:15 pm | Qualcomm Institute Theater
Session 2A: Devices
Chair: Saugata Ghose, University of Illinois Urbana-Champaign
A definitive demonstration that resistance-switching memories are not memristors
Jinsun Kim (University of South Carolina);Yuriy V. Pershin (University of South Carolina);Ming Yin (Benedict College Columbia, South Carolina);Timir Datta (University of South Carolina);Massimiliano Di Ventra (University of California, San Diego);
A definitive demonstration that resistance-switching memories are not memristors
Jinsun Kim (University of South Carolina);Yuriy V. Pershin (University of South Carolina);Ming Yin (Benedict College Columbia, South Carolina);Timir Datta (University of South Carolina);Massimiliano Di Ventra (University of California, San Diego);
Speaker: Jinsun Kim, University of South Carolina
Abstract: There are claims in the literature that all resistanceswitching memories are memristors, namely, resistors whose resistance depends only on the charge that flows across them. Here, we present the first experimental measurement unambiguously showing that such claims are wrong. Our demonstration is based on the recently suggested “ideal memristor test” which exploits a duality in a capacitor-memristor circuit. This duality requires that for any initial state of the memristor (its initial resistance) and any form of the applied voltage, the final state of the memristor (its final resistance) must be identical to its initial state, if the capacitor charge finally returns to its initial value. We have applied the test to a Cu-SiO2 electrochemical metallization cell, and found that the cell is not a memristor: it does not return to the initial state when the circuit is subjected to a voltage pulse. Since the response of our electrochemical metallization cell is typical of most common bipolar resistance-switching memories, we can conclude that resistance-switching memories are not memristors.
Speaker bio: Jinsun (Jin) Kim is a graduate student working with Dr. Yuriy V. Pershin at University of South Carolina (USC). Her research at USC focuses on two concentrations: (1) Reliability and validity test of resistive switching devices with memory; (2) Nano-fabrication and experimental measurement of resistive switching devices with innovative materials. Jin has recently published work in Advanced Electronic Materials 6(7) and two other research works have been submitted for publication. she is receiving her Master’s degree in physics from USC in May 2022.
Ferroelectric nonvolatile memories: Hafnia Based Ferroelectric Tunnel Junctions
Bhagwati Prasad (Materials Engineering Department, Indian Institute of Science Bengaluru, India 560012);Vishal Thakare (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);Alan Kalitsov (Western Digital Research Center, Western Digital Corporation, San Jose, USA 95119);Zimeng Zhang (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);R Ramesh (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);
Ferroelectric nonvolatile memories: Hafnia Based Ferroelectric Tunnel Junctions
Bhagwati Prasad (Materials Engineering Department, Indian Institute of Science Bengaluru, India 560012);Vishal Thakare (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);Alan Kalitsov (Western Digital Research Center, Western Digital Corporation, San Jose, USA 95119);Zimeng Zhang (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);R Ramesh (Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720 USA);
Speaker: Bhagwati Prasad, Indian Institute of Science, Bengaluru
Abstract: High density, high-speed and low power consuming nonvolatile memories are currently being vigorously explored for use in next-generation computation, particularly due to the performance gap between the logic and memory elements of the current computational architecture. Electrically switchable spontaneous. polarization of ferroelectric materials enables a robust nonvolatile memory solution. Using ultrathin films of ferroelectric materials as a tunnel barrier in metal/ferroelectric/metal trilayer structure, so-called ferroelectric tunnel junctions (FTJ), is being explored widely as a potential nonvolatile memory element. Unlike ferroelectric RAM (FeRAM), FTJ offers nondestructive readout, in addition to low operation energy and high operation speed. In this work, we have demonstrated FTJs with a very large OFF/ON resistance ratio and relatively low resistance area product (RA) with ~ 1 nm thick Zr doped HfO2 (HZO) ferroelectric tunnel barrier. We stabilized ferroelectricity in ultrathin films of rhombohedral HZO (R-HZO) through the substrate-induced compressive strain. The resistance area product at the bias voltage (~ 300 mV) required for one-half of the zero-bias TER ratio is three orders of magnitude lower than the reported value with relatively thick ferroelectric barriers, which significantly improves signal-to-noise ratio (SNR) during the read operation. These results set the stage for further exploration of Hafnia-based FTJs for non-volatile memory applications.
Speaker bio: Prof. Bhagwati Prasad is an Assistant Professor in the Department of Materials Engineering at the Indian Institute of Science, Bengaluru. Currently, he is working on emerging memory technologies for the Internet of Things (IoT) and Artificial Intelligence (AI). Before joining IISc, he worked as a Principal Research Scientist at Western Digital, San Jose, USA. In the year 2015, he obtained his doctorate degree in Materials Science from the University of Cambridge (UK), and subsequently, he joined a scientist position at Max Planck Institute for Solid State Research, Stuttgart, Germany. In November 2016, Dr. Prasad moved to the USA and joined Prof R Ramesh’s group at the University of California, Berkeley as a senior postdoctoral researcher. Dr. Prasad has published more than 30 research articles in highly reputed journals and filed more than 35 Patents, including 30 US Patents.
Circadian Rhythm: A Candidate for Achieving Everlasting Flash Memories
Muhammed Ceylan Morgul (University of Virginia);Xinfei Guo (Shanghai Jiao Tong University);Mircea Stan (University of Virginia);
Circadian Rhythm: A Candidate for Achieving Everlasting Flash Memories
Muhammed Ceylan Morgul (University of Virginia);Xinfei Guo (Shanghai Jiao Tong University);Mircea Stan (University of Virginia);
Speaker: M Ceylan Morgul, University of Virginia
Abstract: The existing passive (resting) and the accelerated passive (thermal annealing) self-healing techniques were presented for flash memory's low endurance limitation. Yet, they have been utilized at the end (or near the end) of the lifetime of flash. This approach has left the permanent component of the damages unchecked since they can only recover temporary damage. If not recovered timely, the damages accumulate and become permanent. In this study, we propose implementing a Circadian Rhythm (CR) (as an analogue of nature) recovery technique to target the prevention of permanent damages. Our measurement results show that the most frequent rhythm, compared to the least frequent rhythm, slows down the speed of occurrences of the Byte Error Rate by around 50 times. Moreover, it shows a more flat and linear error occurrence trend since the CR technique prevents most of the permanent damages. The observed behavior in flash chips opens the opportunity of having everlasting flash memories by implementing Circadian Rhythm into Flash Transition Layer (FTL) or Flash File System (FFS).
Speaker bio: Muhammed Ceylan Morgul received his BSc degree in Electronics and Communication Engineering in 2014, and MSc degree in Electronics Engineering in 2017 at Istanbul Technical University. He is currently a Ph.D. student in Electrical Engineering at the University of Virginia. He has been the principal investigator of one TUBITAK, and researcher of EU-H2020-RISE, SRC-JUMP, and TUBITAK projects, in Turkey, the USA, France, Portugal, and Malaysia. He is the author of more than 10 peer-reviewed research papers. His current research interests include the reliability of memory technologies, processing in memory, and emerging computing.
On the Capacity of DNA-based Data Storage under Substitution Errors
Andreas Lenz (Technical University of Munich);Paul Siegel (UCSD);Antonia Wachter-Zeh (Technical University of Munich);Eitan Yaakobi (Technion - Israel Institute of Technology);
On the Capacity of DNA-based Data Storage under Substitution Errors
Andreas Lenz (Technical University of Munich);Paul Siegel (UCSD);Antonia Wachter-Zeh (Technical University of Munich);Eitan Yaakobi (Technion - Israel Institute of Technology);
Speaker: Paul H. Siegel, University of California, San Diego
Abstract: Advances in biochemical technologies, such as synthesizing and sequencing devices, have fueled many recent experiments on archival digital data storage using DNA. In this paper we study the information-theoretic capacity of such storage systems. The channel model incorporates the main properties of DNA-based data storage. We present the capacity of this channel for the case of substitution errors inside the sequences and provide an intuitive interpretation of the capacity formula for relevant channel parameters. We compare the capacity to rates achievable with a sub-optimal decoding method and conclude with a discussion on cost-efficient DNA archive design.
Speaker bio: Paul Siegel is a Distinguished Professor of Electrical and Computer Engineering in the Jacobs School of Engineering at the University of California, San Diego. His interests are in information theory and coding with applications to data storage and transmission. He holds an endowed chair in the Center for Memory and Recording Research, where he served as Director from 2000 to 2011. He has been on the organizing committee of the Non-Volatile Memories Workshop since its inception in 2010.
Session 2B: Systems using Persistent Memories
Chair: Hung-Wei Tseng, UC Riverside
SoftPM: Software Persistent Memory
Yuanchao Xu (North Carolina State University & Google);Wei Xu (Google);Kimberly Keeton (Google);David E. Culler (Google);
SoftPM: Software Persistent Memory
Yuanchao Xu (North Carolina State University & Google);Wei Xu (Google);Kimberly Keeton (Google);David E. Culler (Google);
Speaker: Yuanchao Xu, North Carolina State University & Google
Abstract: Hardware persistent memory (HardPM) offers a promising alternative to DRAM, but the mass adoption necessary to realize its cost advantages remains elusive, especially without broad application demand. An alternative, long-understood approach to fast persistence is to utilize the battery-backed DRAM that is deployed in hyperscalar data centers. We present SoftPM, a Software Persistent Memory design that manages vulnerable DRAM-resident updates to ensure that data is persisted in the event of a power outage. SoftPM supports a user-directed mode by leveraging application persistency models (e.g., logging), a transparent mode that relies on kernel page fault support, and an explicit model that gives the application direct control over persistence. Our implementation significantly outperforms HardPM and hybrid HardPM-DRAM alternatives. By providing a general-purpose solution that leverages existing infrastructure, we hope to spur wider adoption of fast local persistence.
Speaker bio: Yuanchao Xu is a fourth-year Ph.D. candidate at NC State University and a student researcher in the System Research at Google. His research interests are architecture, security, and systems.
PMNet: In-Network Data Persistence
Korakit Seemakhupt (University of Virginia);Sihang Liu (University of Virginia);Yasas Senevirathne (University of Virginia);Muhammad Shahbaz (Purdue University);Samira Khan (University of Virginia);
PMNet: In-Network Data Persistence
Korakit Seemakhupt (University of Virginia);Sihang Liu (University of Virginia);Yasas Senevirathne (University of Virginia);Muhammad Shahbaz (Purdue University);Samira Khan (University of Virginia);
Speaker: Korakit Seemakhupt, University of Virginia
Abstract: The recent adoption of fast storage systems (such as persistent memory) reduces latency of local data accesses. Yet, the latency between application processes and storage backends, which are typically spread across remote servers, remains prohibitive. In-network computing systems, today, can mitigate this remote-access latency, but only for the (stateless) read requests---by computing them within a network device. The requests that update to persistent state must still traverse the server. Realizing such characteristics, we introduce the idea of in-network data persistence and a PMNet system that persists data within the network devices; hence, moving the server off the critical path of update requests.
Speaker bio: Korakit Seemakhupt a fourth-year Ph.D. student in the Department of Computer Science at the University of Virginia. His research focuses on computer network, storage system and real system prototyping of emerging technologies.
Persistent Scripting
Zi Fan Tan (San Jose State University);Jianan Li (Northeastern University);Terence Kelly (none);Haris Volos (University of Cyprus);
Persistent Scripting
Zi Fan Tan (San Jose State University);Jianan Li (Northeastern University);Terence Kelly (none);Haris Volos (University of Cyprus);
Speaker: Terence Kelly,
Abstract: Persistent scripting brings the benefits of persistent memory programming to high-level interpreted languages. More importantly, it brings the convenience and programmer productivity of scripting to persistent memory programming. We have integrated a novel generic persistent memory allocator into a popular scripting language interpreter, which now exposes a simple and intuitive persistence interface: A flag notifies the interpreter that a script's variables reside in a persistent heap in a specified file. The interpreter begins script execution with all variables in the persistent heap ready for immediate use. New variables defined by the running script are allocated on the persistent heap and are thus available to subsequent executions. Scripts themselves are unmodified and persistent heaps may be shared freely between unrelated scripts. Experiments show that our persistent gawk prototype offers good performance while simplifying scripts, and we identify opportunities to reduce interpreter overheads.
5:45 pm-7:00 pm | UCSD Faculty club
Banquet
TUESDAY, MAY 10
8:00 am – 9:00 am | Atkinson Hall
Breakfast
9:00 am – 10:15 am | Qualcomm Institute Auditorium
Keynote II
Chair: Paul Siegel, UC San Diego
Memory-Centric Computing
Onur Mutlu (ETH Zurich and Carnegie Mellon University);
Memory-Centric Computing
Onur Mutlu (ETH Zurich and Carnegie Mellon University);
Speaker: Onur Mutlu, ETH Zurich and Carnegie Mellon University
Abstract: Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency and scalability are bottlenecked by data movement. In this lecture, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high performance computing system. We especially discuss recent research that aims to fundamentally reduce memory latency and energy, and practically enable computation close to data, with at least two promising novel directions: 1) processing using memory, which exploits analog operational properties of memory chips to perform massively-parallel operations in memory, with low-cost changes, 2) processing near memory, which integrates sophisticated additional processing capability in memory controllers, the logic layer of 3D-stacked memory technologies, or memory chips to enable high memory bandwidth and low memory latency to near-memory logic. We show both types of architectures can enable orders of magnitude improvements in performance and energy consumption of many important workloads, such as graph analytics, database systems, machine learning, video processing. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs. A short accompanying paper, which appeared in DATE 2021, can be found here and serves as recommended reading: https://people.inf.ethz.ch/omutlu/pub/intelligent-architectures-for-intelligent-computingsystems-invited_paper_DATE21.pdf
Speaker bio: Onur Mutlu is a Professor of Computer Science at ETH Zurich. He is also a faculty member at Carnegie Mellon University, where he previously held the Strecker Early Career Professorship. His current broader research interests are in computer architecture, systems, hardware security, and bioinformatics. A variety of techniques he, along with his group and collaborators, has invented over the years have influenced industry and have been employed in commercial microprocessors and memory/storage systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. He started the Computer Architecture Group at Microsoft Research (2006-2009), and held various product and research positions at Intel Corporation, Advanced Micro Devices, VMware, and Google. He received the Intel Outstanding Researcher Award, IEEE High Performance Computer Architecture Test of Time Award, the IEEE Computer Society Edward J. McCluskey Technical Achievement Award, ACM SIGARCH Maurice Wilkes Award, the inaugural IEEE Computer Society Young Computer Architect Award, the inaugural Intel Early Career Faculty Award, US National Science Foundation CAREER Award, Carnegie Mellon University Ladd Research Award, faculty partnership awards from various companies, and a healthy number of best paper or "Top Pick" paper recognitions at various computer systems, architecture, and security venues. He is an ACM Fellow "for contributions to computer architecture research, especially in memory systems", IEEE Fellow for "contributions to computer architecture research and practice", and an elected member of the Academy of Europe (Academia Europaea). His computer architecture and digital logic design course lectures and materials are freely available on YouTube (https://www.youtube.com/OnurMutluLectures), and his research group makes a wide variety of software and hardware artifacts freely available online (https://safari.ethz.ch/). For more information, please see his webpage at https://people.inf.ethz.ch/omutlu/.
10:45 am-11:45 am | Qualcomm Institute Auditorium
Session 3: Memorable Paper Award Finalists II
Chair: Eitan Yaakobi, Technion – Israel Institute of Technology
DNA-Storalator: End-to-End DNA Storage Simulator
Gadi Chaykin (Technion - Israel Institute of Technology);Nili Furman (Technion - Israel Institute of Technology);Omer Sabary (University of California San Diego);Dvir Ben Shabat (Technion - Israel Institute of Technology);Eitan Yaakobi (Technion - Israel Institute of Technology);
DNA-Storalator: End-to-End DNA Storage Simulator
Gadi Chaykin (Technion - Israel Institute of Technology);Nili Furman (Technion - Israel Institute of Technology);Omer Sabary (University of California San Diego);Dvir Ben Shabat (Technion - Israel Institute of Technology);Eitan Yaakobi (Technion - Israel Institute of Technology);
Speaker: Omer Sabary, Technion - Israel Institute of Technology
Abstract: DNA-Storalator is a cross-platform software tool that simulates the complete process of encoding, storing, and decoding digital data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of the storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on a comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. DNA- Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimate of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques. This end-to-end DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.
Speaker bio: Omer Sabary is a PhD student at the Computer Science Faculty at the Technions. His advisor is Prof. Eitan Yaakobi and his research interests include coding techniques and algorithms for DNA storage systems. In 2020 he received his M.Sc. from the Computer Science department at the Technion.
Jaaru: Efficiently model checking persistent memory programs
Hamed Gorjiara (UC Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Jaaru: Efficiently model checking persistent memory programs
Hamed Gorjiara (UC Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Speaker: Hamed Gorjiara, University of California, Irvine
Abstract: Persistent memory (PM) technologies combine near DRAM performance with persistency and open the possibility of using one copy of a data structure as both a working copy and a persistent store of the data. Ensuring that these persistent data structures are crash consistent (i.e., power failures) is a major challenge. Stores to persistent memory are not immediately made persistent --- they initially reside in processor cache and are only written to PM when a flush occurs due to space constraints or explicit flush instructions. It is more challenging to test crash consistency for PM than for disks given the PM's byte-addressability that leads to significantly more states. We present Jaaru, a fully-automated and ultra-efficient model checker for PM programs. Key to Jaaru's efficiency is a new technique based on constraint refinement that can reduce the number of executions that must be explored by many orders of magnitude. This exploration technique effectively leverages commit stores, a common coding pattern, to reduce the model checking complexity from exponential in the length of program executions to quadratic. We have evaluated Jaaru with PMDK and RECIPE, and found 25 persistency bugs, 18 of which are new. Jaaru is also orders of magnitude more efficient than Yat, a model checker that eagerly explores all possible states.
Speaker bio: Hamed Gorjiara is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Irvine (UCI). He currently works at the Programming Languages Research Group advised by Brian Demsky. His research interests are software design, compilers, and testing frameworks. Mainly, his research focuses on developing efficient testing frameworks for persistent memory programs to facilitate the adoption of normal programs on this new type of memory.
Concentrated Stopping Set Design for Coded Merkle Tree: Improving Security Against Data Availability Attacks in Blockchain Systems
Debarnab Mitra (University of California, Los Angeles);Lev Tauz (University of California, Los Angeles);Lara Dolecek (University of California, Los Angeles);
Concentrated Stopping Set Design for Coded Merkle Tree: Improving Security Against Data Availability Attacks in Blockchain Systems
Debarnab Mitra (University of California, Los Angeles);Lev Tauz (University of California, Los Angeles);Lara Dolecek (University of California, Los Angeles);
Speaker: Debarnab Mitra, University of California, Los Angeles
Abstract: In certain blockchain systems, light nodes are clients that download only a small portion of the block. Light nodes are vulnerable to a data availability (DA) attack where a malicious node makes the light nodes accept an invalid block by hiding the invalid portion of the block from the nodes in the system. A technique based on LDPC codes called Coded Merkle Tree (CMT), proposed by Yu et al., enables light nodes to detect a DA attack by randomly requesting/sampling portions of the block from the malicious node. However, light nodes fail to detect a DA attack with high probability if a malicious node hides a small stopping set of the LDPC code. To improve the probability of detection, in this work, we demonstrate a specialized LDPC code design that focuses on concentrating stopping sets to a small group of variable nodes rather than only eliminating stopping sets. Our design demonstrates a higher probability of detecting DA attacks compared to prior work thus improving the security of the system.
Speaker bio: Debarnab Mitra is a Ph.D. candidate in the Department of Electrical and Computer Engineering at UCLA. He earned his M.S. degree from the ECE Department at UCLA in 2020, for which he was awarded the Outstanding MS Thesis Award in Signals and Systems. Prior to that, he graduated from IIT Bombay with a B. Tech. (Hons.) in Electrical Engineering and a minor in Computer Science and Engineering in 2018. His research interests include information theory, channel coding, and its applications to blockchains and non-volatile memories.
11:45 am-1:15 pm | Atkinson Hall
Lunch
1:45 pm-2:45 pm | Qualcomm Institute Auditorium
1:45 pm-2:45 pm | Qualcomm Institute Theater
Session 4A: SSD Architectures
Chair: Oliver Hambrey, Siglead
Offline and Online Algorithms for SSD Management
Tomer Lange (Technion - Israel Institute of Technology);Gala Yadgar (Technion - Israel Institute of Technology);Joseph (Seffi) Naor (Technion - Israel Institute of Technology);
Offline and Online Algorithms for SSD Management
Tomer Lange (Technion - Israel Institute of Technology);Gala Yadgar (Technion - Israel Institute of Technology);Joseph (Seffi) Naor (Technion - Israel Institute of Technology);
Speaker: Tomer Lange, Technion - Israel Institute of Technology
Abstract: The abundance of system-level optimizations for reducing SSD write amplification, which are usually based on experimental evaluation, stands in contrast to the lack of theoretical algorithmic results in this problem domain. To bridge this gap, we explore the problem of reducing write amplification from an algorithmic perspective, considering it in both offline and online settings. In the offline setting, we present a near-optimal algorithm. In the online setting, we first consider algorithms that have no prior knowledge about the input. We present a worst case lower bound and show that the greedy algorithm is optimal in this setting. Then we design an online algorithm that uses predictions about the input. We show that when predictions are pretty accurate, our algorithm circumvents the above lower bound. We complement our theoretical findings with an empirical evaluation of our algorithms, comparing them with the state-of-the-art scheme. The results confirm that our algorithms exhibit an improved performance for a wide range of input traces.
Speaker bio: Tomer Lange is a Ph.D. candidate at the computer science department, Technion, Israel. His advisors are Prof. Seffi Naor and Prof. Gala Yadgar. His research interests include algorithms for memory management in storage systems.
RSSD: Defend against Ransomware with Hardware-Isolated Network-Storage Codesign and Post-Attack Analysis
Benjamin Reidys (UIUC);Peng Liu (The Pennsylvania State University);Jian Huang (UIUC);
RSSD: Defend against Ransomware with Hardware-Isolated Network-Storage Codesign and Post-Attack Analysis
Benjamin Reidys (UIUC);Peng Liu (The Pennsylvania State University);Jian Huang (UIUC);
Speaker: Benjamin Reidys, University of Illinois at Urbana-Champaign
Abstract: Encryption ransomware has become a notorious malware. It en- crypts user data on storage devices like solid-state drives (SSDs) and demands a ransom to restore data for users. To bypass existing defenses, ransomware would keep evolving and performing new attack models. For instance, we identify and validate three new attacks, including (1) garbage-collection (GC) attack that exploits storage capacity and keeps writing data to trigger GC and force SSDs to release the retained data; (2) timing attack that intentionally slows down the pace of encrypting data and hides its I/O patterns to escape existing defense; (3) trimming attack that utilizes the trim command available in SSDs to physically erase data. To enhance the robustness of SSDs against these attacks, we propose RSSD, a ransomware-aware SSD. It redesigns the flash management of SSDs for enabling the hardware-assisted logging, which can conservatively retain older versions of user data and received storage operations in time order with low overhead. It also employs hardware-isolated NVMe over Ethernet to expand local storage capacity by transparently offloading the logs to re- mote cloud/servers in a secure manner. RSSD enables post-attack analysis by building a trusted evidence chain of storage opera- tions to assist the investigation of ransomware attacks. We develop RSSD with a real-world SSD FPGA board. Our evaluation shows that RSSD can defend against new and future ransomware attacks, while introducing negligible performance overhead.
Speaker bio: Benjamin Reidys is a second-year Ph.D. student at the University of Illinois, Urbana-Champaign. His research interests include architecture and operating systems with an emphasis on network/storage codesign. Contact him at breidys2@illinois.edu
Rethinking the Performance/Cost of Persistent Memory and SSDs
Kaisong Huang (Simon Fraser University);Darien Imai (Simon Fraser University);Tianzheng Wang (Simon Fraser University);Dong Xie (The Pennsylvania State University);
Rethinking the Performance/Cost of Persistent Memory and SSDs
Kaisong Huang (Simon Fraser University);Darien Imai (Simon Fraser University);Tianzheng Wang (Simon Fraser University);Dong Xie (The Pennsylvania State University);
Speaker: Kaisong Huang, Simon Fraser University
Abstract: For decades, the storage hierarchy consisted of layers with distinct performance characteristics and costs: a higher level (in particular, memory) is assumed to be strictly faster, less capacious, volatile, and more expensive than a lower-level layer (e.g., SSDs and HDDs). This good ol' storage hierarchy, however, is becoming a jungle: On the one hand, persistent memory (PM) breaks the boundary between volatile and non-volatile storage with persistence on the memory bus. On the other hand, modern SSDs' high bandwidth directly rivals PM, breaking the strict hierarchy from the performance perspective. This naturally leads to simple, motivating questions: Could a well-tuned SSD-based data structure (e.g., index) match or outperform a well-tuned PM-tailored data structure under certain workloads? How does the real cost of a PM-based system stack up and compare to that of an SSD-based system? These advances and questions signal the need to revisit the performance/cost of persistent data structures. We take B+-trees and hash tables for an initial inquiry. Our goals are to (1) understand the relative merits of indexing on PM and SSD, (2) reason about the cost of PM- and SSD-based systems, and (3) highlight the implications of the storage jungle on future persistent indexes.
Speaker bio: Kaisong Huang is a 2nd-year PhD student in the School of Computing Science at Simon Fraser University advised by Tianzheng Wang. His research is mainly focused on database engines, transaction processing and storage management, in the context of modern storage technologies like NVMe SSDs and persistent memory.
Session 4B: Data Structures for Persistent Memories
Chair: Ethan Miller, Pure Storage
UniHeap: Managing Persistent Objects Across Managed Runtimes for Non-Volatile Memory
Daixuan Li (UIUC);Benjamin Reidys (UIUC);Jinghan Sun (UIUC);Thomas Shull (UIUC);Josep Torrellas (UIUC);Jian Huang (UIUC);
UniHeap: Managing Persistent Objects Across Managed Runtimes for Non-Volatile Memory
Daixuan Li (UIUC);Benjamin Reidys (UIUC);Jinghan Sun (UIUC);Thomas Shull (UIUC);Josep Torrellas (UIUC);Jian Huang (UIUC);
Speaker: Daixuan Li, UIUC
Abstract: Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective ap- proach (i.e., managed NVM). However, such an approach is runtime specific, which lacks a generic abstraction across different man- aged languages. Similar to the well-known filesystem primitives that allow diverse programs to access same files via the block I/O interface, managed NVM deserves the same system-wide property for persistent objects across managed runtimes with low overhead. In this paper, we present UniHeap, a new NVM framework for managing persistent objects. It proposes a unified persistent ob- ject model that supports various managed languages, and manages NVM within a shared heap that enables cross-language persistent object sharing. UniHeap reduces the object persistence overhead by managing the shared heap in a log-structured manner and coa- lescing object updates during the garbage collection. We implement UniHeap as a generic framework and extend it to different managed runtimes that include HotSpot JVM, cPython, and JavaScript engine SpiderMonkey. We evaluate UniHeap with a variety of applications, such as key-value store and transactional database. Our evaluation shows that UniHeap significantly outperforms state-of-the-art ob- ject sharing approaches, while introducing negligible overhead to the managed runtimes
J-NVM: Off-heap Persistent Objects in Java
Anatole Lefort (Télécom SudParis - Institut Polytechnique de Paris);Yohan Pipereau (Télécom SudParis - Institut Polytechnique de Paris);Kwabena Amponsem Boateng (Télécom SudParis - Institut Polytechnique de Paris);Pierre Sutra (Télécom SudParis - Institut Polytechnique de Paris);Gaël Thomas (Télécom SudParis - Institut Polytechnique de Paris);
J-NVM: Off-heap Persistent Objects in Java
Anatole Lefort (Télécom SudParis - Institut Polytechnique de Paris);Yohan Pipereau (Télécom SudParis - Institut Polytechnique de Paris);Kwabena Amponsem Boateng (Télécom SudParis - Institut Polytechnique de Paris);Pierre Sutra (Télécom SudParis - Institut Polytechnique de Paris);Gaël Thomas (Télécom SudParis - Institut Polytechnique de Paris);
Speaker: Anatole Lefort, Télécom SudParis - Institut Polytechnique de Paris
Abstract: This paper presents J-NVM, a framework to access efficiently Non-Volatile Main Memory (NVMM) in Java. J-NVM offers a fully-fledged interface to persist plain Java objects using failure-atomic blocks. This interface relies internally on proxy objects that intermediate direct off-heap access to NVMM. The framework also provides a library of highly-optimized persistent data types that resist reboots and power failures. We evaluate J-NVM by implementing a persistent backend for the Infinispan data store. Our experimental results, obtained with a TPC-B like benchmark and YCSB, show that J-NVM is consistently faster than other approaches at accessing NVMM in Java.
Speaker bio: Anatole is a fourth-year and final year Ph.D. student at Télécom SudParis - Institut Polytechnique de Paris, advised by Pierre Sutra and Prof. Gaël Thomas from the computer science departement. His current research focuses on bringing persistent programming to cloud workloads and computations. Exploring langage support for PMEM and programming abstractions for distributed computing runtimes and frameworks.
Fast Nonblocking Persistence for Concurrent Data Structures
Wentao Cai (University of Rochester);Haosen Wen (University of Rochester);Vladimir Maksimovski (University of Rochester);Mingzhe Du (University of Rochester);Rafaello Sanna (University of Rochester);Shreif Abdallah (University of Rochester);Michael Scott (University of Rochester);
Fast Nonblocking Persistence for Concurrent Data Structures
Wentao Cai (University of Rochester);Haosen Wen (University of Rochester);Vladimir Maksimovski (University of Rochester);Mingzhe Du (University of Rochester);Rafaello Sanna (University of Rochester);Shreif Abdallah (University of Rochester);Michael Scott (University of Rochester);
Speaker: Wentao Cai, University of Rochester
Abstract: We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is \emph{buffered durably linearizable}: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a `sync` operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free `sync` significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature---dramatically faster than achieved with prior general-purpose systems, and generally within 50\% of equivalent non-persistent structures placed in DRAM.
Speaker bio: Wentao Cai is a last-year PhD student at the University of Rochester. Working with Prof. Michael L. Scott, he is interested in concurrent (nonblocking in particular) data structures, persistent memory, transactions, and memory management.
3:15 pm-4:35 pm | Qualcomm Institute Auditorium
3:15 pm-4:15 pm | Qualcomm Institute Theater
Session 5A: Architectures for Persistent Memories
Chair: Hung-Wei Tseng, UC Riverside
ASAP: A Speculative Approach to Persistence
Sujay Yadalam (University of Wisconsin-Madison);Michael Swift (University of Wisconsin-Madison);
ASAP: A Speculative Approach to Persistence
Sujay Yadalam (University of Wisconsin-Madison);Michael Swift (University of Wisconsin-Madison);
Speaker: Sujay Yadalam, University of Wisconsin-Madison
Abstract: Persistent memory enables a new class of applications that have persistent in-memory data structures. Recoverability of these applications imposes constraints on the ordering of writes to persistent memory. But, the cache hierarchy and memory controllers in modern systems may reorder writes to persistent memory. Therefore, programmers have to use expensive flush and fence instructions that stall the processor to enforce such ordering. While prior efforts circumvent stalling on long latency flush instructions, these designs under-perform in large-scale systems with many cores and multiple memory controllers. We propose ASAP, an architectural model in which the hardware takes an optimistic approach by persisting data eagerly, thereby avoiding any ordering stalls and utilizing the total system bandwidth efficiently. ASAP avoids stalling by allowing writes to be persisted out-of-order, speculating that all writes will eventually be persisted. For correctness, ASAP saves recovery information in the memory controllers which is used to undo the effects of speculative writes to memory in the event of a crash. Over a large number of representative workloads, ASAP improves performance over current Intel systems by 2.3x on average and performs within 3.9% of an ideal system.
Speaker bio: Sujay is a third year PhD student at the University of Wisconsin-Madison working with Prof. Michael Swift. His research interests include computer architecture and systems broadly. In the past few years, he has been working on building faster and secure interfaces to upcoming memory devices including NVM and SSDs.
ReplayCache: Enabling Volatile Caches for Energy Harvesting Systems
Jianping Zeng (Purdue University);Jongouk Choi (Purdue University);Xinwei Fu (Virginia Tech);Ajay Paddayuru Shreepathi (Stony Brook University);Dongyoon Lee (Stony Brook University);Changwoo Min (Virginia Tech);Changhee Jung (Purdue University);
ReplayCache: Enabling Volatile Caches for Energy Harvesting Systems
Jianping Zeng (Purdue University);Jongouk Choi (Purdue University);Xinwei Fu (Virginia Tech);Ajay Paddayuru Shreepathi (Stony Brook University);Dongyoon Lee (Stony Brook University);Changwoo Min (Virginia Tech);Changhee Jung (Purdue University);
Speaker: Jianping Zeng, Purdue University
Abstract: In this paper, we propose ReplayCache, a software-only crash consistency scheme that enables commodity energy harvesting systems to exploit a volatile data cache. ReplayCache does not have to ensure the persistence of dirty cache lines or record their logs at run time. Instead, ReplayCache recovery runtime re-executes the potentially unpersisted stores in the wake of power failure to restore the consistent NVM state, from which the interrupted program can safely resume. To support store replay during recovery, ReplayCache partitions the program into a series of regions in a way that store operand registers remain intact within each region, and checkpoints all registers just before power failure using the crash consistency mechanism of the commodity systems. The evaluation with 23 benchmark applications shows that compared to the baseline with no caches, ReplayCache can achieve about 10.72x and 8.5x-8.9x speedup (on geometric mean) for the scenarios without and with power outages, respectively
Speaker bio: He is a Ph.D. student in the department of computer science at Purdue University. His research interests focus on computer architecture and compiler optimizations.
IceClave: A Trusted Execution Environment for In-Storage Computing
Luyi Kang (University of Maryland, College Park);Yuqi Xue (University of Illinois at Urbana-Champaign);Weiwei Jia (University of Illinois at Urbana-Champaign);Xiaohao Wang (University of Illinois at Urbana-Champaign);Jongryool Kim (SK Hynix);Changhwan Youn (SK Hynix);Myeong Joon Kang (SK Hynix);Hyung Jin Lim (SK Hynix);Bruce Jacob (University of Maryland, College Park);Jian Huang (University of Illinois at Urbana-Champaign);
IceClave: A Trusted Execution Environment for In-Storage Computing
Luyi Kang (University of Maryland, College Park);Yuqi Xue (University of Illinois at Urbana-Champaign);Weiwei Jia (University of Illinois at Urbana-Champaign);Xiaohao Wang (University of Illinois at Urbana-Champaign);Jongryool Kim (SK Hynix);Changhwan Youn (SK Hynix);Myeong Joon Kang (SK Hynix);Hyung Jin Lim (SK Hynix);Bruce Jacob (University of Maryland, College Park);Jian Huang (University of Illinois at Urbana-Champaign);
Speaker: Yuqi Xue, University of Illinois at Urbana-Champaign
Abstract: In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviating the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them treat the in-storage security as the first citizen. Specifically, since modern SSD controllers do not have a trusted execution environment, an offloaded (malicious) program could steal, modify, and even destroy the data stored in the SSD. In this paper, we first investigate the attacks that could be conducted by offloaded in-storage programs. To defend against these attacks, we build a lightweight trusted execution environment, named IceClave for in-storage computing. IceClave enables security isolation between in-storage programs and flash management functions. IceClave also achieves security isolation between in-storage programs and enforces memory encryption and integrity verification of in-storage DRAM with low overhead. To protect data loaded from flash chips, IceClave develops a lightweight data encryption/decryption mechanism in flash controllers. We develop IceClave with a full system simulator. We evaluate IceClave with a variety of data-intensive applications such as databases. Compared to state-of-the-art in-storage computing approaches, IceClave introduces only 7.6\% performance overhead, while enforcing security isolation in the SSD controller with minimal hardware cost. IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.31$\times$ better performance than the conventional host-based trusted computing approach.
Speaker bio: Yuqi Xue is a first-year PhD student studying Electrical and Computer Engineering at University of Illinois at Urbana-Champaign. He is interested in computer architecture and system research with a focus on accelerator-centric system architecture.
GPM: Leveraging Persistent Memory from a GPU
Aditya K Kamath (University of Washington);Shweta Pandey (Indian Institute of Science-Bangalore);Arkaprava Basu (Indian Institute of Science-Bangalore);
GPM: Leveraging Persistent Memory from a GPU
Aditya K Kamath (University of Washington);Shweta Pandey (Indian Institute of Science-Bangalore);Arkaprava Basu (Indian Institute of Science-Bangalore);
Speaker: Aditya K Kamath, University of Washington
Abstract: The GPU is a key computing platform for many application domains. While the new non-volatile memory technology has brought the promise of byte-addressable persistence (a.k.a., persistent memory or PM) to CPU applications, the same, unfortunately, is beyond the reach of GPU programs. We take three key steps toward enabling GPU programs to access PM directly. First, enable direct access to PM from within a GPU kernel without needing to modify the hardware. Next, we demonstrate three classes of GPU-accelerated applications that benefit from PM. In the process, we create a workload suite with nine such applications. We then create a GPU library, written in CUDA, to support logging, checkpointing, and primitives for native persistence for programmers to easily leverage PM.
Speaker bio: Aditya K Kamath is a Ph.D. student at the University of Washington’s Paul G. Allen School of Computer Science and Engineering co-advised by Professor Mark Oskin and Professor Michael Taylor. He enjoys building high-performance software tailored to efficiently utilize underlying architectures and systems. Prior to this, he spent two years as a research assistant at the Indian Institute of Science, working alongside Professor Arkaprava Basu.
Session 5B: Software for Persistent Memories
Chair: Steven Swanson, UC San Diego
Yashme: Detecting Persistency Races
Hamed Gorjiara (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Yashme: Detecting Persistency Races
Hamed Gorjiara (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Speaker: Hamed Gorjiara, University of California, Irvine
Abstract: Persistent memory (PM) or Non-Volatile Random-Access Memory (NVRAM) hardware such as Intel’s Optane memory product promises to transform how programs store and manipulate information. Ensuring that persistent memory programs are crash-consistent is a major challenge. We present a novel class of crash consistency bugs for persistent memory programs, which we call persistency races. Persistency races can cause non-atomic stores to be made partially persistent. Persistency races arise due to the interaction of standard compiler optimizations with persistent memory semantics. We present Yashme, the first detector for persistency races. A major challenge is that in order to detect persistency races, the execution must crash in a very narrow window between a store with a persistency race and its corresponding cache flush operation, making it challenging for naïve techniques to be effective. Yashme overcomes this challenge with a novel technique for detecting races in executions that are prefixes of the pre-crash execution. This technique enables Yashme to effectively find persistency races even if the injected crashes do not fall into that window. We have evaluated Yashme on a range of persistent memory benchmarks and have found 26 real persistency races that have never been reported before.
Speaker bio: Hamed Gorjiara is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Irvine (UCI). He currently works at the Programming Languages Research Group advised by Brian Demsky. His research interests are software design, compilers, and testing frameworks. Mainly, his research focuses on developing efficient testing frameworks for persistent memory programs to facilitate the adoption of normal programs on this new type of memory.
PSan: Checking Robustness to Weak Persistency Models
Hamed Gorjiara (University of California, Irvine);Weiyu Luo (University of California, Irvine);Alex Lee (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
PSan: Checking Robustness to Weak Persistency Models
Hamed Gorjiara (University of California, Irvine);Weiyu Luo (University of California, Irvine);Alex Lee (University of California, Irvine);Guoqing Harry Xu (UCLA);Brian Demsky (University of California, Irvine);
Speaker: Weiyu Luo, University of California, Irvine
Abstract: Persistent memory (PM) technologies offer performance close to DRAM with persistence. Persistent memory enables programs to directly modify persistent data through normal load and store instructions bypassing heavyweight OS system calls for persistency. However, these stores are not made immediately made persistent, the developer must manually flush the corresponding cache lines to force the data to be written to persistent memory. While state-of-the-art testing tools can help developers find and fix persistency bugs, prior studies have shown fixing persistency bugs on average takes a couple of weeks for PM developers. The developer has to manually inspect the execution to identify the root cause of the problem. In addition, most of the existing state-of-the-art testing tools require heavy user annotations to detect bugs without visible symptoms such as a segmentation fault. In this paper, we present robustness as a sufficient correctness condition to ensure that program executions arefree from missing flush bugs. We develop an algorithm for checking robustness and have implemented this algorithm in the PSan tool. PSan can help developers both identify silent data corruption bugs and localize bugs in large traces to the problematic memory operations that are missing flush operations. We have evaluated PSan on a set of concurrent indexes, persistent memory libraries, and two popular real-world applications. We found 48 bugs in these benchmarks that 17 of them were not reported before.
Speaker bio: Weiyu Luo is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at the University of California, Irvine. His research interests include memory model, concurrency, and persistent memory.
Carbide: A Safe Persistent Memory Multilingual Programming Framework
Morteza Hoseinzadeh (Oracle);Steve Swanson (University of California - San Diego);
Carbide: A Safe Persistent Memory Multilingual Programming Framework
Morteza Hoseinzadeh (Oracle);Steve Swanson (University of California - San Diego);
Speaker: Morteza Hoseinzadeh, Oracle
Abstract: Persistent main memories (PM) allow us to create complex, crash-resilient data structures that are directly accessible from the processors. PM programming is difficult, especially because it combines well-known programming issues such as locking, memory management, pointer safety, and new PM-related bug types. If a mistake occurs in any of these areas, it can corrupt data, leak resources, or fail recovery from system crashes. Many PM libraries in different languages make it easy to solve some of these problems, but the more flexible a programming language is, the more likely it is to make mistakes. For example, Corundum as a Rust-based PM library guarantees PM safety by imposing strict rules checked during compilation, while PMDK as a C++-based library does not. This paper presents Carbide, a compiling toolchain based on Corundum, Rust, and C++ which automatically ports safe generic persistent data structures implemented in Corundum from Rust into C++. Carbide lets programmers confidently develop PM-bug-free persistent data structures in Corundum and safely use them in C++. We have implemented Carbide and found its performance to be almost as good as other PM systems, including Corundum, PMDK, Atlas, Mnemosyne, and go-pmem while making strong safety guarantees and allowing flexible programming at the same time.
Speaker bio: I received my Ph.D. from UC San Diego in Fall 2021 and joined Oracle in November 2021. My Ph.D. program was under the supervision of Professor Steven Swanson in the Non-Volatile Systems Lab (NVSL) of the Department of Computer Science and Engineering (CSE). I had been working on building fast, smart storage systems, and fast non-volatile memories which can act like DRAM, and presumably will be able to directly attached to a processor's memory bus. We observed that the implementation of crash-consistent data structures needs extra care to debug. This led us to my Ph.D. thesis research which aims to enforce PMEM safety statically at compile time. We developed a new president memory library based, Corundum, which is written in Rust and statically enforces PMEM safety rules. Using that, programmers can make sure that their programs are provably correct in terms of PMEM-safety, without spending extra time learning the PMEM programming model and debugging their implementations.