SUNDAY, MARCH 10
2:00 pm – 5:30 pm | Price Center East Ballroom
Tutorial
Chair: Jishen Zhao
Persistent Memory Hackathon and Workshop
Jim Fister (SNIA), Andy Rudoff (Intel), and Andrew Maier (Eideticom)
6:00 pm – 9:00 pm | Sheraton La Jolla Ball Room (lower level)
Reception
MONDAY, MARCH 11
7:45 am – 8:45 am | Price Center East Ballroom
Continental Breakfast
8:45 am – 9:00 am | Price Center East Ballroom
Opening Remarks
9:00 am – 10:00 am | Price Center East Ballroom
Keynote I
Designing an Efficient and Robust Neuromorphic Computing System Using Emerging NVM
YiranChen (Duke University);
Designing an Efficient and Robust Neuromorphic Computing System Using Emerging NVM
YiranChen (Duke University);
Speaker:Yiran Chen, Duke University
AbstractFast growth of the computation cost associated with training and testing of deep neural networks (DNNs) inspired various acceleration techniques. Reducing topological complexity and simplifying data representation of neural networks are two approaches that popularly adopted in deep learning society: many connections in DNNs can be pruned and the precision of synaptic weights can be reduced, respectively, incurring no or minimum impact on inference accuracy at algorithm level. However, the practical impacts of hardware are often ignored in these algorithm-level techniques, such as the increase of the random accesses to memory hierarchy, the constraints of memory capacity, and the variability encountered in circuit designs. In addition, the limited understanding about the computational needs at algorithm level may lead to some unrealistic assumptions during the hardware designs. In this talk, we will discuss this mismatch and show how we can solve it through an interactive design practice across both software and hardware levels when designing a neuromorphic computing system using emerging nonvolatile memories.
Speaker bioYiran Chen received B.S and M.S. from Tsinghua University and Ph.D. from Purdue University in 2005. After five years in industry, he joined University of Pittsburgh in 2010 as Assistant Professor and then promoted to Associate Professor with tenure in 2014, held Bicentennial Alumni Faculty Fellow. He now is a tenured Associate Professor of the Department of Electrical and Computer Engineering at Duke University and serving as the director of NSF Industry–University Cooperative Research Center (IUCRC) for Alternative Sustainable and Intelligent Computing (ASIC) and co-director of Duke Center for Evolutionary Intelligence (CEI), focusing on the research of new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems. Dr. Chen has published one book and more than 350 technical publications and has been granted 93 US patents. He serves or served the associate editor of several IEEE and ACM transactions/journals and served on the technical and organization committees of more than 50 international conferences. He received 6 best paper awards and 12 best paper nominations from international conferences. He is the recipient of NSF CAREER award and ACM SIGDA outstanding new faculty award. He is the Fellow of IEEE and Distinguished Member of ACM, a distinguished lecturer of IEEE CEDA, and the recipient of the Humboldt Research Fellowship for Experienced Researchers.
10:00 am – 10:20 am
Break
10:20 am – 12:00 pm | Price Center East Ballroom
Session 1: Memorable Paper Award Finalists I
Chair: Anxiao (Andrew) Jiang
Linear-Time Encoding/Decoding of Irreducible Words for Codes Correcting Tandem Duplications
Yeow MengChee (Nanyang Technological University);JohanChrisnata (Nanyang Technological University);Han MaoKiah (Nanyang Technological University);Tuan ThanhNguyen (Nanyang Technological University);
Linear-Time Encoding/Decoding of Irreducible Words for Codes Correcting Tandem Duplications
Yeow MengChee (Nanyang Technological University);JohanChrisnata (Nanyang Technological University);Han MaoKiah (Nanyang Technological University);Tuan ThanhNguyen (Nanyang Technological University);
Speaker:Nguyen Tuan Thanh, Nanyang Technological University
AbstractTandem duplication is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2017) proposed the study of codes that correct tandem duplications. All code constructions are based on irreducible words. We provide efficient encoders/decoders for codes correcting tandem duplications whose codewords are irreducible. First, we describe an $(\ell, m)$-finite state encoder and show that when $m = \Theta(1/\epsilon)$ and $\ell = \Theta(1/\epsilon)$, the encoder has rate that is $\epsilon$ away from the optimal. We then use combinatorial method to reduce the space requirements for the finite state encoder.
Speaker bioNguyen Tuan Thanh received his Ph.D. degree in mathematics from Nanyang Technological University (NTU), Singapore, in November 2018. Since then, he has been a research officer at the School of Physical and Mathematical Sciences (SPMS), NTU, Singapore. His research interests include enumerative combinatorics and coding theory.
Coding Assisted Adaptive Thresholding for Sneak-Path Mitigation in Resistive Memories
ZehuiChen (UCLA);ClaytonSchoeny (UCLA);LaraDolecek (UCLA);
Coding Assisted Adaptive Thresholding for Sneak-Path Mitigation in Resistive Memories
ZehuiChen (UCLA);ClaytonSchoeny (UCLA);LaraDolecek (UCLA);
Speaker:Zehui Chen, University of California, Los Angeles
AbstractIn crossbar resistive memory, the sneak-path problem is one of the main challenges for reliable readout. The sneak-path event can be described combinatorially and its adverse effect can be modeled as a parallel interference. In this work, based on a high-rate coding scheme, we characterize the inter-cell dependency of sneak-path events probabilistically. Utilizing this characterization, we propose adaptive thresholding schemes for resistive memory readout using side information provided by precoded bits. This estimation theoretic approach effectively reduces the bit-error rate.
Speaker bioZehui Chen is a Ph.D. student in the Electrical and Computer Engineering Department at the University of California, Los Angeles (UCLA). He received his M.S. degrees in Electrical Engineering from UCLA in 2018. He received his B.S. degree in Electrical Engineering from Purdue University, West Lafayette in 2016. He is currently a graduate student researcher at the Laboratory for Robust Information Systems (LORIS) at UCLA. His research interests include coding theory, information theory and their applications in new memory medium.
Multi-Dimensional Spatially-Coupled Code Design with Improved Cycle Properties
HomaEsfahanizadeh (University of California, Los Angeles);AhmedHareedy (University of California, Los Angeles);LaraDolecek (University of California, Los Angeles);
Multi-Dimensional Spatially-Coupled Code Design with Improved Cycle Properties
HomaEsfahanizadeh (University of California, Los Angeles);AhmedHareedy (University of California, Los Angeles);LaraDolecek (University of California, Los Angeles);
Speaker:Homa Esfahanizadeh, UCLA
AbstractA spatially-coupled (SC) code is constructed by coupling disjoint block codes into a single coupled chain. By connecting (coupling) several SC codes, multi-dimensional SC (MD-SC) codes are constructed. In this work, we present a systematic framework for constructing MD-SC codes with notably better cycle properties than the 1D-SC counterparts. Compared to the 1D-SC codes, our MD-SC codes are demonstrated to have up to 85% reduction in the population of the smallest cycle, and up to 3.8 orders of magnitude BER improvement in the early error floor region. The results of this work can be particularly beneficial in data storage systems, e.g., 2D magnetic recording and 3D Flash systems, as high-performance MD-SC codes are robust against various channel impairments and non-uniformity.
Speaker bioHoma Esfahanizadeh is a Ph.D. student in Electrical and Computer Engineering Department at the University of California, Los Angeles (UCLA). She received her M.S. and B.S. degrees in Communications Engineering and Electrical Engineering from the University of Tehran in 2015 and 2012, respectively. Currently, she works at the Laboratory for Robust Information Systems (LORIS), and her focus is on coding schemes for modern storage systems. Her research interests include coding and information theory, signal processing, graph theory, and machine learning.
Random-Access LDPC Codes
EshedRam (Technion, Israel);YuvalCassuto (Technion, Israel);
Random-Access LDPC Codes
EshedRam (Technion, Israel);YuvalCassuto (Technion, Israel);
Speaker:Eshed Ram, Technion
AbstractNew types of LDPC codes motivated by practical storage applications are presented. LDPCL codes (suffix ’L’ stands for locality) can be decoded locally at the level of subblocks that are much smaller than the full code block, thus providing fast random access to the coded information. The same code can also be decoded globally using the entire code block (as usual), for increased data reliability. We present constructions of LDPCL and spatially-coupled LDPCL codes that enable random access, and we exemplify their benefits over ordinary LDPC codes.
Speaker bioEshed Ram received his B.Sc. (summa cum laude) and M.Sc. degrees in electrical engineering from the Technion in 2015 and 2017, respectively. During 2012-2015 he worked at the IBM Research Labs in Haifa. He is currently an Ph.D. candidate in the department of electrical engineering at the Technion under the supervision of Associate Professor Yuval Cassuto.
Coding over Sets for DNA Storage
AndreasLenz (Institute for Communications Engineering, Technical University of Munich, Germany);Paul H.Siegel (Department of Electrical and Computer Engineering, CMRR, University of California, San Diego, California);AntoniaWachter-Zeh (Institute for Communications Engineering, Technical University of Munich, Germany);EitanYaakobi (Computer Science Department, Technion -- Israel Institute of Technology, Haifa, Israel);
Coding over Sets for DNA Storage
AndreasLenz (Institute for Communications Engineering, Technical University of Munich, Germany);Paul H.Siegel (Department of Electrical and Computer Engineering, CMRR, University of California, San Diego, California);AntoniaWachter-Zeh (Institute for Communications Engineering, Technical University of Munich, Germany);EitanYaakobi (Computer Science Department, Technion -- Israel Institute of Technology, Haifa, Israel);
Speaker:Andreas Lenz, Technical University of Munich
AbstractIn this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is stored in an unordered set of $M$ sequences, each of length $L$. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.
Speaker bioAndreas Lenz received his B.Sc and M.Sc degree in Electrical Engineering and Information technology in 2013, respectively 2016 at TUM (both with high distinction). During his studies he was working on analog filter optimization for sub-Nyquist channel estimation systems. As part of his Master studies, he was an exchange student at University of Alberta, Canada. From 2014 until 2016 he was working on mobile network analysis systems at Rohde & Schwarz. For his master thesis, he visited Prof. Swindlehurst from University of California, Irvine. In 2016, he joined the coding for communications and data storage group at TUM (Prof. Wachter-Zeh), where he is involved in research on error correcting codes for insertion and deletion errors. Since 2016, Andreas is teaching assistant of the lecture "Channel Coding"
12:00 pm – 1:00 pm | Price Center West Ballroom
Lunch and Poster Session
1:00 pm – 2:00 pm | Price Center East Ballroom
Keynote II
Spatially-coupled codes: design and applications in modern memories and data storage
LaraDolecek (UCLA);
Spatially-coupled codes: design and applications in modern memories and data storage
LaraDolecek (UCLA);
Speaker:Lara Dolecek, UCLA
AbstractError correction methods (ECC) are an integral component of all modern data storage and memory devices. Spatially coupled codes are a promising class of ECC methods, due to their excellent asymptotic performance. In this talk, we will review exiting recent results on finite-length spatially coupled (SC) codes, wth the primary focus on code design for data storage and memory applications. We will present a combinatorial framework for constructing binary and non-binary SC codes, multidimensional SC codes, and demonstrate the effectiveness of these codes for a variety of practical channels. We will also discuss future directions.
Speaker bioLara Dolecek is a professor with the Electrical and Computer Engineering Department at UCLA. She received her BS, MS, and PhD degrees in EECS as well as an MA degree in Statistics, all from UC Berkeley. She is a recipient of several research and teaching awards including NSF CAREER, IBM Faculty Award, Intel Early Career Faculty Award, Okawa Research Grant, and UCLA Northrop Grumman Excellence in Teaching Award. With her research group and collaborators, she received numerous best paper awards. Her current research interests are in coding and machine learning for modern data storage and large-scale computing systems. Prof. Dolecek has served as a consultant for a number of companies specializing in data communications and storage.
2:00 pm-2:20 pm
Break
2:20 pm-4:00 pm | Price Center East Ballroom
Session 2: Memorable Paper Award Finalists II
Chair: Anirudh Badam
In-Memory Data Parallel Processor
DaichiFujiki (University of Michigan);ScottMahlke (University of Michigan);ReetuparnaDas (University of Michigan);
In-Memory Data Parallel Processor
DaichiFujiki (University of Michigan);ScottMahlke (University of Michigan);ReetuparnaDas (University of Michigan);
Speaker:Daichi Fujiki, University of Michigan
AbstractRecent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our in-memory processor. Our results demonstrate 7.5$\times$ speedup over a multi-core CPU server for a set of applications from Parsec and 763$\times$ speedup over a server-class GPU for a set of Rodinia benchmarks.
Speaker bioDaichi Fujiki is a Ph.D. Candidate in Computer Science and Engineering at the University of Michigan, Ann Arbor. He is advised by Professor Reetuparna Das. His research interests include in-memory / in-cache computing for general and application-specific workloads, and domain-specific architectures. He received M.S.Eng in 2017 from the University of Michigan, Ann Arbor, and B.E. in 2016 from Keio University, Japan.
Easy Lock-Free Programming in Non-Volatile Memory
TianzhengWang (Simon Fraser University);JustinLevandoski (Amazon Web Services);PaulLarson (University of Waterloo);
Easy Lock-Free Programming in Non-Volatile Memory
TianzhengWang (Simon Fraser University);JustinLevandoski (Amazon Web Services);PaulLarson (University of Waterloo);
Speaker:Tianzheng Wang, Simon Fraser University
AbstractMany systems use lock-free data structures (e.g., queues, B+-trees) to achieve high performance. Byte-addressable, non-volatile memory (NVRAM) such as Intel 3D XPoint further adds persistence to these data structures on the memory bus, potentially enabling desired features like instant recovery and lower cost while maintaining high performance. Lock-Free Programming in NVRAM, however, is even harder than the already-hard volatile case. Lock-free data structures usually need to atomically modify multiple 8-byte words (e.g., B+-tree splits), but the hardware only provides atomic instructions such as compare-and-swap (CAS) that work on single memory words. In NVRAM, the same instructions can be used, but since the CPU cache is volatile, there has to be a persistence protocol in place so that the data structure recovers correctly after a crash. Such persistence protocols tend to be data structure specific, complex and error-prone to implement. Our solution is Persistent Multi-Word Compare-and-Swap (PMwCAS), a new software primitive that allows applications to atomically modify multiple arbitrary 8-byte words in NVRAM, in a lock-free manner with persistence guarantees. Moreover, PMwCAS allows the application to completely avoid customized recovery code (which is necessary in prior approaches), greatly reducing the complexity of lock-free programming. We have used PMwCAS to adapt SQL Server's Bw-Tree (a lock-free B+-tree) and a doubly-linked skip list for NVRAM, and evaluation results show that PMwCAS only adds very low (4-6%) runtime overhead, while allowing implementations almost as mechanical as a lock-based one, without the need to orchestrate complex data races and recovery like a lock-free implementation using CAS. PMwCAS has enabled several projects, including the BzTree, a new index structure for NVRAM by Microsoft Research. PMwCAS is also open source at: https://github.com/Microsoft/pmwcas.
Speaker bioTianzheng Wang is an assistant professor in the School of Computing Science at Simon Fraser University in Canada (since 2018 Fall). He works on the boundary between software and hardware to build better systems by fully utilizing the underlying hardware. His current research focuses on database systems and related systems areas that impact the design of database systems, such as operating systems, distributed systems, and synchronization. He is also interested in storage, mobile and embedded systems. Tianzheng Wang received his Ph.D. in computer science from the University of Toronto in 2017, advised by Ryan Johnson and Angela Demke Brown. Prior to joining Simon Fraser University, he spent one year (2017-2018) at Huawei Canada Research Centre in Toronto as a research engineer.
Addressing Fast-Detrapping for Reliable 3D NAND Flash Design
Mustafa MunawarShihab (The University of Texas at Dallas);JieZhang (Yonsei University);MyoungsooJung (KAIST);MahmutKandemir (Pennsylvania State University);
Addressing Fast-Detrapping for Reliable 3D NAND Flash Design
Mustafa MunawarShihab (The University of Texas at Dallas);JieZhang (Yonsei University);MyoungsooJung (KAIST);MahmutKandemir (Pennsylvania State University);
Speaker:Mustafa M. Shihab, The University of Texas at Dallas
AbstractThe paradigm shift from planar (two dimensional (2D)) to vertical (three-dimensional (3D)) models has placed the NAND flash technology on the verge of a design evolution that can handle the demands of next-generation storage applications. However, it also introduces challenges that may obstruct the realization of such 3D NAND flash. Specifically, we observed that the fast threshold drift (fast-drift) in a charge-trap flash-based 3D NAND cell can make it lose a critical fraction of the stored charge relatively soon after programming and generate errors. In this work, we first present an elastic read reference ($V_{Ref}$) scheme (ERR) for reducing such errors in ReveNAND — our fast-drift aware 3D NAND design. To address the inherent limitation of the adaptive $V_{Ref}$, we introduce a new intra-block page organization (hitch-hike) that can enable stronger error correction for the error-prone pages. In addition, we propose a novel reinforcement-learning-based smart data refill scheme (iRefill) to counter the impact of fast-drift with minimum performance and hardware overhead. Finally, we present the first analytic model to characterize fast-drift and evaluate its system-level impact. Our results show that, compared to conventional 3D NAND design, our ReveNAND can reduce fast-drift errors by 87%, on average, and can lower the ECC latency and energy overheads by 13X and 10X, respectively.
Speaker bioMustafa is currently a fourth-year Ph.D. student of Electrical and Computer Engineering at the University of Texas at Dallas. Under the supervision of Dr. Yiorgos Makris, his research pivots around Computer Architecture, Reconfigurable Computing and their implications in Hardware Security. Prior to joining the PhD program, Mustafa received his MS in Electrical Engineering from Auburn University (Alabama, USA), where he worked on low-power VLSI design and testing. He has also worked as an Software Architect Intern in the SSG group at Intel. Mustafa is looking forward to a career where his circuit-to-system, cross-platform experience can be applied and appreciated.
Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories
ShoaibAkram (Ghent University);Jennifer B.Sartor (Ghent University and VUB);Kathryn S.McKinley (Google);LievenEeckhout (Ghent University);
Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories
ShoaibAkram (Ghent University);Jennifer B.Sartor (Ghent University and VUB);Kathryn S.McKinley (Google);LievenEeckhout (Ghent University);
Speaker:Shoaib Akram, Ghent University
AbstractPhase Change Memory (PCM) offers higher capacity and energy efficiency than DRAM. It has two disadvantages: (1) write endurance is low, and (2) latency is high. Hybrid memory combines DRAM and PCM to promise low latency, higher capacity, energy efficiency and durability. Prior hardware and OS approaches spread writes out using wear leveling, and place frequently-written pages in DRAM. Unfortunately, prior coarse-grained approaches lead to impractical PCM lifetimes of 4 years or less for popular Java applications. This work exploits garbage collection in managed language runtimes to make PCM a practical DRAM replacement leaving the programming model unchanged. We find that for 16 Java applications on average (1) 70% of writes occur to newly-allocated nursery objects, and (2) 2% of objects capture 81% of writes to mature objects. We introduce two writerationing garbage collectors that improve PCM lifetime: (1) Kingsguard-nursery places nursery objects in DRAM and survivors in PCM, reducing PCM writes by 5× compared to PCM-only systems with wear-leveling. (2) Kingsguardwriters (KG-W) places nursery objects in DRAM and nursery survivors in a DRAM monitoring space. It then monitors writes to all mature objects and moves unwritten mature objects from DRAM to PCM. Because most mature objects are unwritten, KG-W exploits PCM capacity while increasing PCM lifetimes by 11×. This work demonstrates garbage collection as a promising avenue to manage hybrid memories.
Speaker bioShoaib Akram is a Ph.D. candidate at Ghent University in Belgium. He has an M.S. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign and a B.S. in Electrical Engineering from the University of Engineering and Technology in Pakistan. His research focuses on the intersection of programming languages, system software, and computer architecture. His current research investigates software approaches to ease the adoption of emerging memory technologies. His recent research also explores the potential of language runtimes in abstracting the complexity of heterogeneous hardware.
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
PengfeiZuo (Huazhong University of Science and Technology);YuHua (Huazhong University of Science and Technology);JieWu (Huazhong University of Science and Technology);
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
PengfeiZuo (Huazhong University of Science and Technology);YuHua (Huazhong University of Science and Technology);JieWu (Huazhong University of Science and Technology);
Speaker:Pengfei Zuo, Huazhong University of Science and Technology & University of California, Santa Barbara
AbstractNon-volatile memory (NVM) as persistent memory is expected to substitute or complement DRAM in memory hierarchy, due to the strengths of non-volatility, high density, and near-zero standby power. However, due to the requirement of data consistency and hardware limitations of NVM, traditional indexing techniques originally designed for DRAM become inefficient in persistent memory. To efficiently index the data in persistent memory, this paper proposes a write-optimized and high-performance hashing index scheme, called level hashing, with low-overhead consistency guarantee and cost-efficient resizing. Level hashing provides a sharing-based two-level hash table, which achieves a constant-scale search/insertion/deletion/update time complexity in the worst case and rarely incurs extra NVM writes. To guarantee the consistency with low overhead, level hashing leverages log-free consistency schemes for insertion, deletion, and resizing operations, and an opportunistic log-free scheme for update operation. To cost-efficiently resize this hash table, level hashing leverages an in-place resizing scheme that only needs to rehash 1/3 of buckets instead of the entire table, thus significantly reducing the number of rehashed buckets and improving the resizing performance. Experimental results demonstrate that level hashing achieves 1.4-3.0 times speedup for insertions,1.2-2.1 times speedup for updates, and over 4.3 times speedup for resizing, while maintaining high search and deletion performance, compared with state-of-the-art hashing schemes.
Speaker bioPengfei Zuo is a Ph.D. student at Huazhong University of Science and Technology (HUST) advised by Prof. Yu Hua and currently a visiting Ph.D. student at University of California, Santa Barbara (UCSB) advised by Prof. Yuan Xie. He obtained the B.E. degree in computer science and technology from HUST in 2014. His research interests include non-volatile memory system and architecture, storage system, and security. He has published multiple papers in major conferences including OSDI, MICRO, USENIX ATC, SoCC, IPDPS, ICDCS, MSST, DATE, and HotStorage.
4:00 pm-4:20 pm
Break
4:20 pm – 5:20 pm | Price Center East Ballroom
Session 3: Emerging Memory Technologies
Chair: Amit Berman
Codes Correcting Under- and Over-Shift Errors in Racetrack Memories
Yeow MengChee (Nanyang Technological University);Han MaoKiah (Nanyang Technological University);AlexanderVardy (University of California San Diego);Van KhuVu (Nanyang Technological University);EitanYaakobi (Technion - Israel Institute of Technology);
Codes Correcting Under- and Over-Shift Errors in Racetrack Memories
Yeow MengChee (Nanyang Technological University);Han MaoKiah (Nanyang Technological University);AlexanderVardy (University of California San Diego);Van KhuVu (Nanyang Technological University);EitanYaakobi (Technion - Israel Institute of Technology);
Speaker:Van Khu Vu, Nanyang Technological University
AbstractRacetrack memory is a new technology which utilizes magnetic domains along a nanoscopic wire in order to obtain extremely high storage density. In racetrack memory, each magnetic domain can store a single bit of information, which can be sensed by a head. The memory has a tape-like structure which supports a shift operation that moves the domains to be read sequentially by the head. In order to increase the memory’s speed, prior work studied how to minimize the latency of the shift operation, while the no less important reliability of this operation has received only a little attention. There are two dominant kinds of errors in a shift operation, namely under-shift and limited-over-shift errors. In this work, we propose a coding scheme to combat these shift-errors. We first show that under-shift, limited-over-shift errors can be modeled as sticky-insertions, deletion-bursts of limited length, respectively. Therefore, in the model of multiple heads, a non-binary code correcting multiple bursts of deletions and any number of sticky-insertions can be used to tackle this problem. The goal of this work is to design such optimal codes.
Speaker bioHe received the B.Sc. degree in mathematics from Vietnam National University (VNU), Hanoi with the first rank in GPA of Class of 2010 and PhD degree in mathematics from School of Physical and Mathematical Sciences at Nanyang Technological University (NTU), Singapore in 2018. Between 2010-2012, he was a lecturer at Faculty of Mathematics, Mechanics and Informatics in VNU University of Sciences. Currently, he is a researcher in NTU, Singapore. His ultimate goal is to study and develop mathematical tools in combinatorics, algebra, knot theory, probability and statistic to invent original coding techniques and new algorithms to solve real world problems. In particular, his research currently concentrates on coding techniques for various data storage systems, such as Flash memories, Racetrack memories, Resistive memories and DNA based data storage.
Polar Coding for Selector-less Resistive Memories
MarwenZorgui (UC Irvine);MohammedFouda (UC Irvine);ZhiyingWang (UC Irvine);AhmedEltawil (UC Irvine);FadiKurdahi (UC Irvine);
Polar Coding for Selector-less Resistive Memories
MarwenZorgui (UC Irvine);MohammedFouda (UC Irvine);ZhiyingWang (UC Irvine);AhmedEltawil (UC Irvine);FadiKurdahi (UC Irvine);
Speaker:Zhiying Wang, University of California, Irvine
AbstractTransistor-based memories are rapidly approaching their maximum density per unit area. Resistive crossbar arrays enable denser memory due to the small size of switching devices. However, due to the resistive nature of these memories, they suffer from current sneak paths complicating the readout procedure. In this work, we propose an error-correcting scheme mitigating the sneak path effect based on polar codes. We describe the proposed code construction and show numerically the performance improvement in terms of bit error rate.
Speaker bioZhiying Wang received the B.Sc. degree in Information Electronics and Engineering from Tsinghua University in 2007, M. Sc. and Ph.D degrees in Electrical Engineering from California Institute of Technology in 2009 and 2013, respectively. She was a postdoctoral fellow in Department of Electrical Engineering, Stanford University. She is currently Assistant Professor at Center for Pervasive Communications and Computing, University of California, Irvine. Dr. Wang is the recipient of NSF Center for Science of Information (CSoI) Postdoctoral Research Fellow, 2013. She received IEEE Communication Society Data Storage Best Paper Award. Her research focuses on information theory, coding theory, with an emphasis on coding for data storage.
Resistive Memory Fully Compatible with Advanced CMOS Nodes
JeremyGuy (Crossbar Inc.);AmitPrakash (Crossbar Inc.);Sung HyunJo (Crossbar Inc.);
Resistive Memory Fully Compatible with Advanced CMOS Nodes
JeremyGuy (Crossbar Inc.);AmitPrakash (Crossbar Inc.);Sung HyunJo (Crossbar Inc.);
Speaker:Jérémy Guy, Crossbar Inc.
AbstractWe report the feasibility of high density Aluminum based Resistive memory (ReRAM) and its extremely good performances and behavior when coupled with optimized material stack. Sub 50nm devices integrated in 1T1R 2Mb array are offering exceptional advantages, including a Forming free switching, 400°C baking stability, outstanding 225°C retention and beyond 100k cycle endurance.
Speaker bioJérémy Guy was born in Paris in 1989, he received the Engineering degree in material science from the Institut National des Sciences Appliquées de Lyon (INSA), as well as M.S. degree in microelectronic and embedded system from INSA, and the University of Lyon, Lyon, in 2012. He completed his Ph.D degree in physics and nano-electronics, with both the Institut National Polytechnique de Grenoble, and CEA – Leti, Grenoble, France in 2015. He joined Crossbar, Santa Clara, CA, in 2016 in order to continue his career in nonvolatile memory technologies. He’s the author and co-author of multiple publications, journal articles and internationals patents relative to RRAM.
4:20 pm – 5:20 pm | Price Center West Ballroom
Session 4: Security and Integrity
Chair: Brian Kurkoski
Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes
PengfeiZuo (Huazhong University of Science and Technology);YuHua (Huazhong University of Science and Technology);MingZhao (Arizona State University);WenZhou (Huazhong University of Science and Technology);YunchengGuo (Huazhong University of Science and Technology);
Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes
PengfeiZuo (Huazhong University of Science and Technology);YuHua (Huazhong University of Science and Technology);MingZhao (Arizona State University);WenZhou (Huazhong University of Science and Technology);YunchengGuo (Huazhong University of Science and Technology);
Speaker:Yu Hua, Huazhong University of Science and Technology
AbstractNon-volatile memory (NVM) technologies are considered as promising candidates of the next-generation main memory. However, the non-volatility of NVMs leads to new security vulnerabilities. For example, it is not difficult to access sensitive data stored on stolen NVMs. Memory encryption can be employed to mitigate the security vulnerabilities, but it increases the number of bits written to NVMs due to the diffusion property and thereby aggravates the NVM wear-out induced by writes. To address these security and endurance challenges, this paper proposes DeWrite, a secure and deduplication-aware scheme to enhance the performance and endurance of encrypted NVMs based on a new in-line deduplication technique and the synergistic integrations of deduplication and memory encryption. Specifically, it performs low-latency in-line deduplication to exploit the abundant cache-line-level duplications leveraging the intrinsic read/write asymmetry of NVMs and light-weight hashing. It also opportunistically parallelizes the operations of deduplication and encryption and allows them to co-locate the metadata for high time and space efficiency. DeWrite was implemented on the gem5 with NVMain and evaluated using 20 applications from SPEC CPU2006 and PARSEC. Extensive experimental results demonstrate that DeWrite reduces on average 54% writes to encrypted NVMs, and speeds up memory writes and reads of encrypted NVMs by 4.2x and 3.1x, respectively. Meanwhile, DeWrite improves the system IPC by 82% and reduces 40% of energy consumption on average.
Speaker bioYu Hua is a professor in Huazhong University of Science and Technology. He was Postdoc Research Associate in McGill University and Postdoc Research Fellow in University of Nebraska-Lincoln. His research interests include cloud storage systems, non-volatile memory, big data analytics, etc. His papers have been published in major conferences, including OSDI, FAST, MICRO, USENIX ATC, ACM SoCC, SC, HPDC. He serves for multiple international conferences, including ASPLOS (ERC), SOSP (SRC&Poster), USENIX ATC, SC, SoCC, RTSS, ICDCS, INFOCOM, IPDPS. He is the distinguished member of CCF, senior member of ACM and IEEE, and the member of USENIX. He has been appointed as the Distinguished Speaker of ACM and CCF.
Multi-level Access and Information Leakage in Scalable Cloud Storage
SIYIYANG (UCLA);ClaytonSchoeny (UCLA);LauraConde-Canencia (Universit\'{e} de Bretagne Sud, Lorient, France);LARADOLECEK (UCLA);
Multi-level Access and Information Leakage in Scalable Cloud Storage
SIYIYANG (UCLA);ClaytonSchoeny (UCLA);LauraConde-Canencia (Universit\'{e} de Bretagne Sud, Lorient, France);LARADOLECEK (UCLA);
Speaker:SIYI YANG, UCLA
AbstractCodes providing multi-level access have received substantial research attention because of their capabilities to combat server failures in cloud storage. We provide a general construction that is both sufficient and necessary for reaching the well-known singleton bound for multi-level accessible codes. We present a general decoding protocol as well. Based on this result, we derive the lower bound on information leakage, which is viewed as the amount of information conveyed from unintended local clouds to the central cloud about their own local messages. We introduce a code construction based on Cauchy Reed-Solomon (CRS) codes, and prove that this construction achieves the lower bound on the information leakage.
Speaker bioSiyi Yang is a Ph.D. student in the Electrical and Computer Engineering department at the University of California, Los Angeles (UCLA). She received her B.S. degree in Electrical Engineering from the Tsinghua University, in 2016 and the M.S. degree in Electrical and Computer Engineering from the University of California, Los Angeles (UCLA) in 2018. Her research interests include design of error-correction codes for memory and distributed storage.
ENTT: A Family of Emerging NVM-based Trojan Triggers
KarthikeyanNagarajan (Pennsylvania State University);Mohammad Nasim ImtiazKhan (Pennsylvania State University);SwaroopGhosh (Pennsylvania State University);
ENTT: A Family of Emerging NVM-based Trojan Triggers
KarthikeyanNagarajan (Pennsylvania State University);Mohammad Nasim ImtiazKhan (Pennsylvania State University);SwaroopGhosh (Pennsylvania State University);
Speaker:Mohammad Nasim Imtiaz Khan, Penn State
AbstractHardware Trojans in the form of malicious modifications during the design and/or the fabrication process is a security concern due to the globalization of semiconductor production process. A Trojan is designed to evade structural and functional testing and trigger under certain conditions (e.g., after number of clock ticks or assertion of a rare net) and deliver the payload (e.g., denial-of-service, information leakage). A wide variety of logic Trojans (both triggers and payloads) have been identified, however, very limited literature exists on memory Trojans in spite of their high likelihood. Emerging Non-Volatile Memories (NVMs) e.g., Resistive RAM (RRAM) possess unique characteristics e.g., non-volatility and gradual drift in resistance with pulsing voltage that make them a prime target to deploy a Hardware Trojan. Here, we present two flavors of delay-based and two flavors of voltage-based Trojan triggers by exploiting the RRAM resistance drift under pulsing current. Simulation results indicate that these triggers can be activated by accessing a pre-selected address 2500-3500 times (varies with trigger designs) since the proposed trigger requires a large number of hammering to evade test phase. Due to non- volatility, the hammering need not be consequent and therefore can evade system-level techniques that can classify hammering as potential security threat. We also propose two mechanisms to reset the triggers.
Speaker bioNasim is a fourth-year doctoral candidate in the School of Electrical Engineering and Computer Science of The Pennsylvania State University (Penn State), under the advisement of Dr. Swaroop Ghosh. Nasim received his Bachelors from the department of Electrical Engineering of Bangladesh University of Engineering and Technology (BUET), 2014. Before starting Ph.D, Nasim was a lecturer in the department of Electrical and Electronic Engineering of Daffodil International University, Bangladesh and before that we worked as an Associate Maintenance Professional in Halliburton, Bangladesh. His research interests include hardware security and low-power circuit design. Currently, he is exploring the security and privacy aspects of emerging non-volatile memories like STTRAM, MRAM and RRAM for cryptographic application. He is a student member of IEEE.
6:00 pm – 10:00 pm
TUESDAY, MARCH 12
8:00 am – 9:00 am | Price Center East Ballroom
Continental Breakfast
9:00 am – 10:00 am | Price Center East Ballroom
Keynote III
Memory-Driven Computing
KimberlyKeeton (Hewlett Packard Labs);
Memory-Driven Computing
KimberlyKeeton (Hewlett Packard Labs);
Speaker:Kimberly Keeton, Hewlett Packard Labs
AbstractData growth and data analytics requirements are outpacing the compute and storage technologies that have provided the foundation of processor-driven architectures for the last five decades. This divergence requires a deep rethinking of how we build systems, and points towards a memory-driven architecture where memory is the key resource and everything else, including processing, revolves around it. Memory-driven computing brings together byte-addressable persistent memory, a fast memory fabric, task-specific processing, and a new software stack to address these data growth and analysis challenges. This architecture fundamentally changes assumptions that most software developers make: memory is precious and needs to be conserved; memory is volatile and persistent state needs to be preserved elsewhere; and in scale-out applications, processes must communicate over I/O networks using message passing. In this architecture, memory is large, so space efficiency within programs isn't an issue; it's persistent, so much of the software used to retain data in the event of power failures and other faults is no longer needed; and it's shared across a fabric, implying that data partitioning and message passing may no longer be required. In this talk we will share initial experiences in exploring memory-driven architectures, including illustrating how memory-driven computing benefits applications, highlighting work in data management and programming models for memory-driven architectures, and outlining challenges that must be addressed to realize the memory-driven computing vision.
Speaker bioDr. Kimberly Keeton is a Distinguished Technologist at Hewlett Packard Labs. She holds a Ph.D. and an M.S. in Computer Science from the University of California, Berkeley, and a B.S. in Computer Engineering and Engineering and Public Policy from Carnegie Mellon University. Her recent research is in the areas of memory-driven computing and NVM-aware data stores and programming models. She has also worked in the areas of storage and information management, storage dependability, NoSQL databases, intelligent storage, and workload characterization. She is a Fellow of the ACM and a Senior Member of the IEEE, and has served as Technical Program Committee Chair for multiple USENIX, ACM, IEEE and IFIP sponsored conferences.
10:00 am – 10:10 am
Awards Announcements
10:10 am – 10:20 am
Break
10:20 am – 11:00 am | Price Center East Ballroom
10:20 am – 11:00 am | Price Center West Ballroom
Session 5: Coding & Learning
Chair: Eyal En Gad
File Type Recognition and Error Correction for NVMs with Deep Learning
PulakeshUpadhyaya (Texas A&M University);Anxiao (Andrew)Jiang (Texas A&M University);
File Type Recognition and Error Correction for NVMs with Deep Learning
PulakeshUpadhyaya (Texas A&M University);Anxiao (Andrew)Jiang (Texas A&M University);
Speaker:Pulakesh Upadhyaya, Texas A&M University
AbstractStorage systems have a strong need for substantially improving their error correction capabilities, especially for longterm storage where the accumulating errors can exceed the decoding threshold of error-correcting codes (ECCs). For non-volatile memories (NVM), this is especially important because noise mechanisms such as charge leakage, read/write disturbs, and cell-quality degradation due to P/E cycling result in accumulating errors, especially in long-term storage. In this work, a new scheme is presented that uses deep learning to perform soft decoding for noisy files based on their natural redundancy (which refers to the redundancy in uncompressed or imperfectly compressed data). The soft decoding result is then combined with ECCs for substantially better error correction performance.
Speaker bioPulakesh Upadhyaya is a Ph.D. candidate in the in the Department of Computer Science at Texas A&M University. His research interests include error correction codes, coding for natural redundancy (an intersection of coding theory and machine learning) and data storage in flash memories.
Error Correction for Hardware-Implemented Deep Neural Networks
PulakeshUpadhyaya (Texas A&M University);XiaojingYu (Texas A&M University);JacobMink (Texas A&M University);JeffreyCordero (Texas A&M University);PalashParmar (Texas A&M University);Anxiao (Andrew)Jiang (Texas A&M University);
Error Correction for Hardware-Implemented Deep Neural Networks
PulakeshUpadhyaya (Texas A&M University);XiaojingYu (Texas A&M University);JacobMink (Texas A&M University);JeffreyCordero (Texas A&M University);PalashParmar (Texas A&M University);Anxiao (Andrew)Jiang (Texas A&M University);
Speaker:Pulakesh Upadhyaya, Texas A&M University
AbstractIn this work, we study how the performance of DNNs degrades when noise is present. We focus on two analog error correcting codes (ECCs) which are suitable for protecting analog weights in DNNs. In the first code, we have designed a systematic linear analog code, which allows the weights of the DNN to be stored in their original form. In the second code, which is a systematic non-linear analog code, we design a new maximum a posteriori (MAP) decoder for enhanced error correction. We experimentally show that these codes can significantly improve the performance of DNNs. We then extend the study to binarized DNN, and show how noise in different layers affects the DNN performance in different ways. This observation is useful for optimizing the code rates of ECCs protecting different layers of the DNN.
Speaker bioPulakesh Upadhyaya is a Ph.D. candidate in the in the Department of Computer Science at Texas A&M University. His research interests include error correction codes, coding for natural redundancy (an intersection of coding theory and machine learning) and data storage in flash memories.
Session 6: Objects I
Chair: Joe Izraelevitz
Log-Structured Non-Volatile Main Memory
QingdaHu (Tsinghua University);JingleiRen (Microsoft Research);AnirudhBadam (Microsoft Research);JiwuShu (Tsinghua University);ThomasMoscibroda (Microsoft Research);
Log-Structured Non-Volatile Main Memory
QingdaHu (Tsinghua University);JingleiRen (Microsoft Research);AnirudhBadam (Microsoft Research);JiwuShu (Tsinghua University);ThomasMoscibroda (Microsoft Research);
Speaker:Youmin Chen, Tsinghua University
AbstractEmerging non-volatile main memory (NVMM) unlocks the performance potential of applications by storing persistent data in the main memory. Such applications require a lightweight persistent transactional memory (PTM) system, instead of a heavyweight filesystem or database, to have fast access to data. In a PTM system, the memory usage, both capacity and bandwidth, plays a key role in dictating performance and efficiency. Existing memory management mechanisms for PTMs generate high memory fragmentation, high write traffic and a large number of persist barriers, since data is first written to a log and then to the main data store. In this paper, we present a log-structured NVMM system that not only maintains NVMM in a compact manner but also reduces the write traffic and the number of persist barriers needed for executing transactions. All data allocations and modifications are appended to the log which becomes the location of the data. Further, we address a unique challenge of log-structured memory management by designing a tree-based address translation mechanism where access granularities are flexible and different from allocation granularities. Our results show that the new system enjoys up to 89.9% higher transaction throughput and up to 82.8% lower write traffic than a traditional PTM system.
Speaker bioYoumin Chen is a third year Ph.D. student at the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Jiwu Shu and Prof. Youyou Lu. Before that, he received his Bachelor’s degree from Honors College (School of Advanced Engineering), Beihang University. His research area mainly covers distributed systems and storage systems. He is particularly interested in building storage systems leveraging the emerging hardware including non-volatile memories and RDMA. His works have been published/accepted by USENIX ATC'17, Eurosys'19, IEEE TC, ACM TOS, etc.
Object-Oriented Recovery for Non-volatile Memory
NachshonCohen (Amazon);David T.Aksun (EPFL);JamesLarus (EPFL);
Object-Oriented Recovery for Non-volatile Memory
NachshonCohen (Amazon);David T.Aksun (EPFL);JamesLarus (EPFL);
Speaker:James Larus, EPFL
AbstractNVM requires new programming models to ensure that the persistent storage is left in a recoverable state after an unexpected or abrupt program failure. Capturing a running application's consistent state is, however, only part of recovery. The persistent heap will be used after a restart, which means that it must be put into a state that is consistent in the **new** environment that exists when the application resumes execution. Fixing these problems after a crash is a significant burden on a programmer. To clear and reinitialize transient fields and update pointers, the programmer must iterate over all live durable objects and update pointers. For these reasons, existing NVM systems generally do not distinguish or reinitialize transient fields and some use self-relative offsets instead of direct pointers. In this paper, we present a C++ language extension for **NVM reconstruction**, the process of reestablishing the consistency of data structures stored in NVM in the environment in which an application is being restarted. Reconstruction runs concurrently and lazily with an application, which allows an application to restart and respond quickly, without the long latency from updating the entire persistent heap.
Speaker bioJames Larus is Professor and Dean of the School of Computer and Communication Sciences (IC) at EPFL (École Polytechnique Fédérale de Lausanne). Prior to joining IC in October 2013, Larus was a researcher, manager, and director in Microsoft Research for over 16 years and an assistant and associate professor in the Computer Sciences Department at the University of Wisconsin, Madison. Larus has been an active contributor to numerous communities. He published over 100 papers (with 9 best and most influential paper awards), received over 40 US patents. Larus received a National Science Foundation Young Investigator award in 1993 and became an ACM Fellow in 2006. Larus received his MS and PhD in Computer Science from the University of California, Berkeley in 1989, and an AB in Applied Mathematics from Harvard in 1980.
11:00 am – 11:20 am
Break
11:20 am – 12:20 pm | Price Center East Ballroom
11:20 am – 12:20 pm | Price Center West Ballroom
Session 7: Distributed Systems
Chair: Zhiying Wang
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions in Multi-Tenant Storage Systems
DaehyeokKim (Carnegie Mellon University);AmirsamanMemaripour (UC San Diego);AnirudhBadam (Microsoft);YiboZhu (Bytedance);Hongqiang HarryLiu (Alibaba);JituPadhye (Microsoft);ShacharRaindel (Microsoft);StevenSwanson (UC San Diego);VyasSekar (Carnegie Mellon University);SrinivasanSeshan (Carnegie Mellon University);
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions in Multi-Tenant Storage Systems
DaehyeokKim (Carnegie Mellon University);AmirsamanMemaripour (UC San Diego);AnirudhBadam (Microsoft);YiboZhu (Bytedance);Hongqiang HarryLiu (Alibaba);JituPadhye (Microsoft);ShacharRaindel (Microsoft);StevenSwanson (UC San Diego);VyasSekar (Carnegie Mellon University);SrinivasanSeshan (Carnegie Mellon University);
Speaker:Daehyeok Kim, Carnegie Mellon University
AbstractStorage systems in data centers are an important component of large-scale online services. They typically perform replicated transactional operations for high data availability and integrity. Today, however, such operations suffer from high tail latency even with recent kernel bypass and storage optimizations, and thus affect the predictability of end-to-end performance of these services. We observe that the root cause of the problem is the involvement of the CPU, a precious commodity in multi-tenant settings, in the critical path of replicated transactions. In this paper, we present \sys, a new framework that removes CPU from the critical path of replicated transactions in storage systems by offloading them to commodity RDMA NICs, with non-volatile memory as the storage medium. To achieve this, we develop new and general NIC offloading primitives that can perform memory operations on all nodes in a replication group while guaranteeing ACID properties without CPU involvement. We demonstrate that popular storage applications can be easily optimized using our primitives. Our evaluation results with microbenchmarks and application benchmarks show that HyperLoop can reduce 99$^{th}$ percentile latency $\approx 800\times$ with close to $0\%$ CPU consumption on replicas.
Speaker bioDaehyeok Kim is a third year PhD student in the Computer Science Department at Carnegie Mellon University, where he is advised by Professor Srinivasan Seshan and Professor Vyas Sekar. His research interests lie in the intersection of systems and networking with a current focus on making data centers faster and more efficient by designing novel network primitives with advanced networking hardware such as programmable switches and RDMA NICs. He received his BS and MS degrees in Computer Science and Engineering from Pohang University of Science and Technology, South Korea.
Building Atomic, Crash-Consistent Data Stores with Disaggregated Persistent Memory
Shin-YehTsai (Purdue University);YiyingZhang (Purdue University);
Building Atomic, Crash-Consistent Data Stores with Disaggregated Persistent Memory
Shin-YehTsai (Purdue University);YiyingZhang (Purdue University);
Speaker:Yiying Zhang, Purdue University
AbstractByte-addressable persistent memories (NVM) has finally made their way into production. An important and pressing problem that follows is how to deploy them in existing datacenters. One viable approach is to attach NVM as self-contained devices to the network as disaggregated persistent memory, or DPM. DPM requires no changes to existing servers in datacenters; without the need to include a processor, DPM devices are cheap to build; and by sharing DPM across compute servers, they offer great elasticity and efficient resource packing. This paper explores different ways to organize DPM and to build data stores with DPM. Specifically, we propose three architectures of DPM: 1) compute nodes directly access DPM (DPM-Direct); 2) compute nodes send requests to a coordinator server, which then accesses DPM to complete a request (DPM-Central); and 3) compute nodes directly access DPM for data operations and communicate with a global metadata server for the control plane (DPM-Sep). Based on these architectures, we built three atomic, crash-consistent data stores. We evaluated their performance, scalability, and CPU cost with micro-benchmarks and YCSB. Our evaluation results show that DPM-Direct has great small-size read but poor write performance; DPM-Central has the best write performance when the scale of the cluster is small but performs poorly when the scale increases; and DPM-Sep performs well overall.
Speaker bioYiying Zhang is an assistant professor in the School of Electrical and Computer Engineering at Purdue University. Her research interests span operating systems, distributed systems, datacenter networking, and computer architecture, with a focus on building software, hardware, and networking systems for next-generation datacenters. Her recent research interest is in datacenter resource disaggregation. Yiying received her Ph.D. from the Department of Computer Sciences at the University of Wisconsin-Madison under the supervision of Andrea and Remzi Arpaci-Dusseau and worked as a postdoctoral scholar at the University of California, San Diego before joining Purdue.
Octopus: an RDMA-enabled Distributed Persistent Memory File System
YouyouLu (Tsinghua University);JiwuShu (Tsinghua University);YouminChen (Tsinghua Unviersity);TaoLi (University of Florida);
Octopus: an RDMA-enabled Distributed Persistent Memory File System
YouyouLu (Tsinghua University);JiwuShu (Tsinghua University);YouminChen (Tsinghua Unviersity);TaoLi (University of Florida);
Speaker:Youmin Chen, Tsinghua University
AbstractNon-volatile memory (NVM) and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this paper, we propose an RDMA-enabled distributed persistent memory file system, Octopus, to redesign file system internal mechanisms by closely coupling NVM and RDMA features. For data operations, Octopus directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to re-balance the load between the server and network. For metadata operations, Octopus introduces self-identified RPC for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Evaluations show that Octopus achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.
Speaker bioYoumin Chen is a third year Ph.D. student at the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Jiwu Shu and Prof. Youyou Lu. Before that, he received his Bachelor’s degree from Honors College (School of Advanced Engineering), Beihang University. His research area mainly covers distributed systems and storage systems. He is particularly interested in building storage systems leveraging the emerging hardware including non-volatile memories and RDMA. His works have been published/accepted by USENIX ATC'17, Eurosys'19, IEEE TC, ACM TOS, etc.
Session 8: ECC
Chair: Xinmiao Zhang
LDPC Error Floor Prediction using Trapping Set aware Code Shortening
DavidDeclercq (Codelucida, Inc.);BaneVasic (Codelucida, Inc. & Department of ECE University of Arizona);ShivaPlanjery (Codelucida, Inc.);BenReynwar (Codelucida, Inc.);VamsiYella (Codelucida, Inc.);
LDPC Error Floor Prediction using Trapping Set aware Code Shortening
DavidDeclercq (Codelucida, Inc.);BaneVasic (Codelucida, Inc. & Department of ECE University of Arizona);ShivaPlanjery (Codelucida, Inc.);BenReynwar (Codelucida, Inc.);VamsiYella (Codelucida, Inc.);
Speaker:David Declercq, Codelucida Inc.
AbstractIn this presentation, we address the problem of the error floor prediction of Low Density Parity Check (LDPC) decoders. Our approach is based on Monte-Carlo simulations on a shortened version of the original code. The shortening is performed such that the resulting code contains the most harmful trapping sets, giving rise to an error floor with the same slope as the original code. Our results show an accurate prediction of the error floor with a computation speed-up of 10$^{4}$.
Speaker bioDavid Declercq graduated his PhD in Statistical Signal Processing 1998, from the University of Cergy-Pontoise, France. He is co-founder and CTO of CODELUCIDA Inc., whose main activity is to develop custom designed low-density parity-check (LDPC) error-correction solutions to support the latest flash memories and other emerging memories for the highest performance and endurance. He was previously full professor at the ENSEA, Cergy-Pontoise, FRANCE, a graduate school of Engineering, from 1999 to 2017. He has been the general secretary of the National GRETSI association, and is a Senior member of the IEEE. He held the Junior position at the Institut Universitaire de France, from 2009 to 2014. His research and development interests lie in digital communications and error-correction coding theory. He worked several years on LDPC codes, both from the code and decoder design aspects. He especially focused on the development of new algorithms for ultra-low complexity and very-high throughput hardware architectures of binary and non-binary LDPC decoders, with quasi-error free performance without error floors. HE published more than 50 papers in scientific journals, and more than 150 papers in major conferences in Information Theory, Error Correction Coding and Signal Processing.
Leveraging RAID for Soft BCH Decoding
EranSharon (Western Digital);IshaiIlani (Western Digital);IdanAlrod (Western Digital);
Leveraging RAID for Soft BCH Decoding
EranSharon (Western Digital);IshaiIlani (Western Digital);IdanAlrod (Western Digital);
Speaker:Ishai Ilani, Western Digital
AbstractStorage Class Memory (SCM) is gaining increasing attraction in recent years as means to close the access time gap between memory and storage and enable next generation high performance computing and big data applications. One of the SCM challenges is enabling reliable, cost efficient, high performance storage in the presence of random errors and memory defects. Erasure codes, such as XORing across a Redundant Array of Independent Dies (RAID), are used for handling memory defects. Error Correction Codes (ECC) are used for handling random errors. Due to the high performance SCM requirements, low latency algebraic ECC solutions, such as BCH codes, are the preferred choice in SCM. Both the Erasure codes (i.e. XOR over RAID) and ECC require overprovisioning which effect the SCM cost efficiency. In this paper, we propose a scheme for significantly boosting the ECC correction capability by leveraging the existing XOR overprovisioning for enabling Soft BCH Decoding.
Speaker bioIshai Ilani was born in Jerusalem, Israel, in 1957. He received the B.Sc., M.Sc. and Ph.D. degrees in mathematics from the Hebrew University, Jerusalem. From 1997-2000 he was a Researcher at ECI Telecom. From 2000 to 2007 he was Company Scientist at Actelis Networks. In both positions he was active in DSL Standard committees, (ITU-T, ANSI, ETSI). From 2012 he held a position of Principal Research Engineer at SanDisk Corporation, in Kfar Saba, Israel. Following the acquisition of SanDIsk by Western Digital, he is now a Senior Technologist at Western Digital Corporation.
Generalized Low Density Parity Check Codes
EranSharon (WDC);RanZamir (WDC);DudyAvraham (WDC);
Generalized Low Density Parity Check Codes
EranSharon (WDC);RanZamir (WDC);DudyAvraham (WDC);
Speaker:Eran Sharon, Western Digital
AbstractError Correction Coding (ECC) technology is a key enabler for high-density storage, allowing reliable storage over increasingly unreliable media due to memory process scaling. The race for improved memory cost efficiency and performance fuels the search for efficient ECC solutions, maximizing the correction capability for a given overprovisioning, while having low complexity and low power encoding and decoding methods. Today’s state of the art ECC solutions used in storage applications are based on Low Density Parity Check (LDPC) codes. In this comprehensive study, we explore Generalized LDPC (GLDPC) codes, including development of code design and generation tools and efficient low complexity encoding and decoding solutions. We show that the designed GLDPC codes outperform existing state of the art LDPC codes.
Speaker bio: Eran Sharon is an Engineering Fellow at WD, heading an R&D team developing a broad range of coding, DSP and memory management solutions for NVM. Eran has numerous publications in leading venues and holds over 175 issued patents in the fields of storage and communications. He received his PhD in EE (2009) from Tel-Aviv University. He is the recipient of several awards, including Weinstein excellence prize, ACC Feder Prize for best graduate student research and several SanDisk Innovation awards.
12:20 pm – 1:20 pm | Price Center West Ballroom
Lunch / Poster Session
1:20 pm – 2:00 pm | Price Center East Ballroom
Session 9: Hierarchies
Chair: Sudursun Kannan
Current-Sensing Efficient Adder for Processing-in-Memory Design
JoonseopSim (University of California, San Diego);MohsenImani (University of California, San Diego);
Current-Sensing Efficient Adder for Processing-in-Memory Design
JoonseopSim (University of California, San Diego);MohsenImani (University of California, San Diego);
Speaker:Mohsen Imani, University of California, Sandiego
AbstractInternet of Things (IoT) involves processing massive data. This poses a huge challenge in the current computingsystems due to the limited memory bandwidth. Processingin-memory (PIM) is a promising candidate to minimize thisbottleneck and reduce the performance gap between proces-sor and memory latency. We proposeLUPIS(Latch-Up basedProcessing In-memory System) for nonvolatile memory (NVM). Unlike existing PIM techniques, which mainly focus on bitwiseoperation based computations and involve considerable latencyand area penalty, our design facilitates computations like addi-tion and multiplication with very low latency. This makes thesystem faster and more efficient as compared to the state-of-the-art technologies. We evaluate LUPIS at both circuit-leveland application-level. Our evaluations show that LUPIS can enhance the performance and energy efficiency by 62× and 484× respectively as compared to a recent GPGPU architecture. Compared to the state-of-the-art PIM accelerator, our design presents 12.7X and 20.9X improvement in latency and energy consumption with insignificant overhead of 21% for area increase and one cycle for latency delay.
Speaker bioMohsen Imani received his M.S. and BCs degrees from the School of Electrical and Computer Engineering at the University of Tehran in March 2014 and September 2011 respectively. From September 2014, he is a Ph.D. student in the Department of Computer Science and Engineering at the University of California San Diego, CA, USA. He is a project leader at System Energy Efficient Laboratory (SeeLab) where he is mentoring several graduate and undergraduate students on different computer engineering projects from circuit to system level. Mr. Imani research focuses on computer architecture, machine learning, and brain-inspired computing.
To Cache Or To Bypass? A Fine Balance in The Emerging Memory Technology Era
KunalKorgaonkar (UC San Diego);IshwarBhati (Intel);HuichuLiu (Intel);JayeshGaur (Intel);SasikanthManipatruni (Intel);SreenivasSubramoney (Intel);TanayKarnik (Intel);StevenSwanson (UC San Diego);IanYoung (Intel);HongWang (Intel);
To Cache Or To Bypass? A Fine Balance in The Emerging Memory Technology Era
KunalKorgaonkar (UC San Diego);IshwarBhati (Intel);HuichuLiu (Intel);JayeshGaur (Intel);SasikanthManipatruni (Intel);SreenivasSubramoney (Intel);TanayKarnik (Intel);StevenSwanson (UC San Diego);IanYoung (Intel);HongWang (Intel);
Speaker:Kunal Korgaonkar, UC San Diego
AbstractWith the availability of new memory technologies like MRAM and ReRAM, the days of SRAM only on-chip caches are likely coming to an end. In our recent work presented at ISCA 2018~\cite{Korgaonkar2018}, we showed the benefits of replacing the SRAMs of Last Level Caches (LLCs) with STT-MRAM in high-performance processors. Our work unearthed key findings regarding optimal caching/bypassing policies which are unlike those used in current state-of-the-art caching hierarchies. Relative to SRAM, the newer memory technologies can provide 2x to 4x capacity. However, utilizing this capacity to the fullest requires maintaining a fine balance between caching and bypassing. We found that in a hierarchy using emerging technology both caching and bypassing policies become levers to controlling the effective bandwidth (both read and write bandwidth) and the effective latency (affecting the hit rate and hence latency). As new memory technologies are being introduced, we believe maintaining a balance between caching and bypassing is likely to become even more relevant, not just for on-chip caches, but across the entire memory hierarchy.
Speaker bioKunal Korgaonkar is a PhD candidate at the CSE department at UC San Diego. As part of this thesis, he is designing scalable micro-architectures for systems using emerging memory technologies. He is advised by Prof. Steven Swanson. Kunal has a masters from IIT Madras, India (Contact email id - kkorgaon@ucsd.edu)
1:20 pm – 2:00 pm | Price Center West Ballroom
Session 10: Latency
Chair: Yuval Cassuto
Hiding the Microsecond-Scale Latency of Storage-Class Memories with Duplexity
AmirhosseinMirhosseini (University of Michigan);AkshithaSriraman (University of Michigan);ThomasWenisch (University of Michigan);
Hiding the Microsecond-Scale Latency of Storage-Class Memories with Duplexity
AmirhosseinMirhosseini (University of Michigan);AkshithaSriraman (University of Michigan);ThomasWenisch (University of Michigan);
Speaker:Amirhossein Mirhosseini, University of Michigan.
AbstractWe are entering the ``killer microsecond'' era in data center applications. Due to advances in processor, memory, storage, and networking technologies, events that stall execution increasingly fall in a microsecond-scale latency range. Storage-class memories, such as 3D Xpoint, are examples of such events that stall execution for single-digit microseconds. Whereas contemporary computing systems are effectively equipped with mechanisms to hide nanosecond- and millisecond-scale stalls, they lack efficient support for microsecond-scale stalls. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of microsecond-scale stalls (e.g., caused by accessing storage-class memories), without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily execute a single latency-critical master thread, and lender-cores, which multiplex latency-insensitive throughput threads.When the master thread stalls, the master-core borrows filler threads from the lender-core, filling microsecond-scale utilization holes of the microservice. Our evaluation demonstrates that Duplexity is able to achieve 1.9x higher core utilization and 2.7x lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average, in the face of microsecond-scale stalls.
Speaker bioAmirhossein Mirhosseini is a third-year PhD student in the Computer Science and Engineering department at the University of Michigan and works with Prof. Thomas Wenisch. He received his bachelor's degree from Sharif University of Technology in Iran. His current research interests include Datacenter Architectures and Architectural Support for Microservices.
Large-Scale Adaptive Mesh Simulations Through Non-Volatile Byte-Addressable Memory
BaoNguyen (Washington State University Vancouver);HuaTan (Washington State University Vancouver);XuechenZhang (Washington State University Vancouver);KeiDavis (Los Alamos National Laboratory);
Large-Scale Adaptive Mesh Simulations Through Non-Volatile Byte-Addressable Memory
BaoNguyen (Washington State University Vancouver);HuaTan (Washington State University Vancouver);XuechenZhang (Washington State University Vancouver);KeiDavis (Los Alamos National Laboratory);
Speaker:Xuechen Zhang, Washington State University Vancouver
AbstractThis paper presents a novel data structure Persistent Merged octree (PM-octree) for both meshing and in-memory storage of persistent octrees using NVBM. It is a multi-version data structure and can recover from failures using its earlier persistent version stored in NVBM. In addition, we have designed a feature-directed sampling approach to help dynamically transform the PM-octree layout for reducing NVBM-induced memory write latency. Our results show that simulations implemented using PM-octree have good scalability and provide consistency upon failure.
Speaker bioXuechen Zhang received the M.S. and the Ph.D. in Computer Engineering from Wayne State University. He is currently an assistant professor in the School of Engineering and Computer Science at Washington State University Vancouver. His research interests include the areas of file and storage systems and high-performance computing. He is a member of the IEEE and ACM.
2:00 pm – 2:20 pm
Break
2:20 pm – 3:20 pm | Price Center East Ballroom
Session 11: New Hardware
Chair: Jae Young Do
Challenges in Building and Deploying Disaggregated Persistent Memory
YizhouShan (Purdue University);YutongHuang (Purdue University);YiyingZhang (Purdue University);
Challenges in Building and Deploying Disaggregated Persistent Memory
YizhouShan (Purdue University);YutongHuang (Purdue University);YiyingZhang (Purdue University);
Speaker:Yiying Zhang, Purdue University
AbstractByte-addressable non-volatile memory (NVM) such as 3D-Xpoint and the memristor provides persistence, close-to-DRAM performance, and high density. Apart from packaging NVMs in SSDs and using them as storage devices, NVMs can be used as memory. These usage models are often called non-volatile main memory or persistent memory (PM). We believe that a more cost-efficient and flexible way to deploy PM is to build PMs as stand-alone devices and to attach them directly to the datacenter network, a model we call disaggregated persistent memory, or DPM. Each DPM device has a network interface, a PM controller, and bulk PM. DPM shares many benefits with a more general disaggregated datacenter architecture. This work addresses the hardware and networking challenges in building and deploying DPMs.
Speaker bioYiying Zhang is an assistant professor in the School of Electrical and Computer Engineering at Purdue University. Her research interests span operating systems, distributed systems, datacenter networking, and computer architecture, with a focus on building software, hardware, and networking systems for next-generation datacenters. Her recent research interest is in datacenter resource disaggregation. Yiying received her Ph.D. from the Department of Computer Sciences at the University of Wisconsin-Madison under the supervision of Andrea and Remzi Arpaci-Dusseau and worked as a postdoctoral scholar at the University of California, San Diego before joining Purdue.
Hardware support for ACID Transactions in Persistent Memory
ArpitJoshi (Intel);VijayNagarajan (University of Edinburgh);MarceloCintra (Intel);StratisViglas (Google);
Hardware support for ACID Transactions in Persistent Memory
ArpitJoshi (Intel);VijayNagarajan (University of Edinburgh);MarceloCintra (Intel);StratisViglas (Google);
Speaker:Arpit Joshi, Intel
AbstractThe emergence of byte-addressable persistent (non-volatile) memory provides a low latency and high bandwidth path to durability. However, programmers need guarantees on what will remain in persistent memory in the event of a system crash. A widely accepted model for crash consistent programming is ACID transactions, in which updates within a transaction are made visible as well as durable in an atomic manner. However, existing software based proposals suffer from significant performance overheads. In this proposal, we support both atomic visibility and durability in hardware. We propose DHTM (durable hardware transactional memory) that leverages a commercial HTM to provide atomic visibility and extends it with hardware support for redo logging to provide atomic durability. Furthermore, we leverage the same logging infrastructure to extend the supported transaction size (from being L1-limited to LLC-limited) with only minor changes to the coherence protocol. Our evaluation shows that DHTM outperforms the state-of-the-art by an average of $21\%$ to $25\%$ on TATP, TPC-C and a set of microbenchmarks. We believe DHTM is the first complete and practical hardware based solution for ACID transactions that has the potential to significantly ease the burden of crash consistent programming.
Programming Storage Controllers with OX
Ivan LuizPicoli (IT University of Copenhagen);PinarTözün (IT University of Copenhagen);AndrzejWasowski (IT University of Copenhagen);PhilippeBonnet (IT University of Copenhagen);
Programming Storage Controllers with OX
Ivan LuizPicoli (IT University of Copenhagen);PinarTözün (IT University of Copenhagen);AndrzejWasowski (IT University of Copenhagen);PhilippeBonnet (IT University of Copenhagen);
Speaker:Ivan Luiz Picoli, IT University of Copenhagen
AbstractOffloading processing to storage is a means to minimize data movement and efficiently scale up processing capabilities for increasing data volumes. Existing approaches, e.g., from ScaleFlux or NGX, push functions to a SSD on top of a generic FTL. In previous work, we have argued that generic FTLs are not desirable because of inefficient redundancies and missed opportunities for optimization across layers. In this paper, we argue that offloading processing to storage is a great opportunity for cross layer optimization, if we have an appropriate framework for programming the storage controller. We discuss the lessons we learnt programming the storage controller with the OX template.
Speaker bioIvan Luiz Picoli is finishing his PhD at the IT University of Copenhagen. Ivan worked at Tsinghua University and Microsoft Research. He contributed to the DFC open source initiative. His PhD scholarship is funded by Brazilian CAPES.
2:20 pm – 3:20 pm | Price Center West Ballroom
Session 12: Data & Indexing
Chair: Yu Cai
Designing a True Direct-Access File System with DevFS
SudarsunKannan (Rutgers University);AndreaArpaci-Dusseau (University of Wisconsin-Madison);Remzi D.Arpaci-Dusseau (University of Wisconsin-Madison);YuangangWang (Huawei Technologies);JunXu (Huawei Technologies);GopinathPalani (Huawei Technologies);
Designing a True Direct-Access File System with DevFS
SudarsunKannan (Rutgers University);AndreaArpaci-Dusseau (University of Wisconsin-Madison);Remzi D.Arpaci-Dusseau (University of Wisconsin-Madison);YuangangWang (Huawei Technologies);JunXu (Huawei Technologies);GopinathPalani (Huawei Technologies);
Speaker:Sudarsun Kannan, Rutgers University
AbstractWe present DevFS, a direct-access file system embedded completely within a storage device. DevFS provides direct, concurrent access without compromising file system integrity, crash consistency, and security. A novel reverse-caching mechanism enables the usage of host memory for inactive objects, thus reducing memory load upon the device. Evaluation of an emulated DevFS prototype shows more than 2x higher I/O throughput with direct access and up to a 5x reduction in device RAM utilization.
Speaker bioSudarsun Kannan is an Assistant Professor at Rutgers University's Computer Science Department with a research focus on Operating Systems. More specifically, he works on problems relating to heterogeneous resource (memory, storage, and compute) management challenges and understanding their impact on large-scale applications. Before joining Rutgers, Sudarsun was a postdoc at the University of Wisconsin-Madison's Computer Science Department and graduated from the College of Computing, Georgia Tech. His thesis explored methods to support hardware heterogeneity in Operating Systems.
Reducing DRAM footprint with NVM in Facebook
AssafEisenman (Stanford University);DarrylGardner (Facebook);IslamAbdelRahman (Facebook);JensAxboe (Facebook);SiyingDong (Facebook);KimHazelwood (Facebook);ChrisPetersen (Facebook);AsafCidon (Stanford University);SachinKatti (Stanford University);
Reducing DRAM footprint with NVM in Facebook
AssafEisenman (Stanford University);DarrylGardner (Facebook);IslamAbdelRahman (Facebook);JensAxboe (Facebook);SiyingDong (Facebook);KimHazelwood (Facebook);ChrisPetersen (Facebook);AsafCidon (Stanford University);SachinKatti (Stanford University);
Speaker:Assaf Eisenman, Stanford University
AbstractPopular SSD-based key-value stores consume a large amount of DRAM in order to provide high-performance database operations. However, DRAM can be expensive for data center providers, especially given recent global supply shortages that have resulted in increasing DRAM costs. In this work, we design a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second (QPS) as MyRocks on a server with a much larger amount of DRAM. Replacing DRAM with NVM introduces several challenges. In particular, NVM has limited read bandwidth, and it wears out quickly under a high write bandwidth. We design novel solutions to these challenges, including using small block sizes with a partitioned index, aligning blocks post-compression to reduce read bandwidth, utilizing dictionary compression, implementing an admission control policy for which objects get cached in NVM to control its durability, as well as replacing interrupts with a hybrid polling mechanism. We implemented MyNVM and measured its performance in Facebook's production environment. Our implementation reduces the size of the DRAM cache from 96 GB to 16 GB, and incurs a negligible impact on latency and queries-per-second compared to MyRocks. Finally, to the best of our knowledge, this is the first study on the usage of NVM devices in a commercial data center environment.
Speaker bioAssaf is a PhD student at Stanford University, focusing in cloud computing and storage systems. He conducted multiple research projects at Facebook, as well as in Hewlett-Packard Labs, and was previously a performance architect at Intel. Assaf obtained his MS in Electrical Engineering from Stanford University and his BSc in Computer Engineering Cum Laude from the Technion - Israel Institute of Technology.
Managing Non-Volatile Memory in Database Systems
Alexandervan Renen (TUM);ViktorLeis (TUM);
Managing Non-Volatile Memory in Database Systems
Alexandervan Renen (TUM);ViktorLeis (TUM);
Speaker:Alexander van Renen, Technical University of Munich
AbstractNon-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with the persistence of traditional storage devices like flash (SSD). While these properties make NVM highly promising, it is not yet clear how to best integrate NVM into the storage layer of modern database systems. Two system designs have been proposed. The first is to use NVM exclusively, i.e., to store all data and index structures on it. However, because NVM has a higher latency than DRAM, this design can be less efficient than main-memory database systems. For this reason, the second approach uses a page-based DRAM cache in front of NVM. This approach, however, does not utilize the byte addressability of NVM and, as a result, accessing an uncached tuple on NVM requires retrieving an entire page. In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs. This allows us to determine how much performance gain can be expected from NVM. We also propose a lightweight storage manager that simultaneously supports DRAM, NVM, and flash. Our design utilizes the byte addressability of NVM and uses it as an additional caching layer that improves performance without losing the benefits from the even faster DRAM and the large capacities of SSDs.
Speaker bioAlexander van Renen is a doctoral candidate at the database group at Technical University of Munich in the group of Alfons Kemper and Thomas Neumann. In his research he is interested in database design, storage, algorithms, transactions processing and data-structures. Currently he is working on efficient database architectures for NVM.
3:20 pm – 3:40 pm
Break
3:40 pm – 4:40 pm | Price Center East Ballroom
Session 13: Objects II
Chair: Ishai Ilani
Designing a User-Friendly Java NVM Framework
ThomasShull (University of Illinois at Urbana-Champaign);JianHuang (University of Illinois at Urbana-Champaign);JosepTorrellas (University of Illinois at Urbana-Champaign);
Designing a User-Friendly Java NVM Framework
ThomasShull (University of Illinois at Urbana-Champaign);JianHuang (University of Illinois at Urbana-Champaign);JosepTorrellas (University of Illinois at Urbana-Champaign);
Speaker:Thomas Shull, University of Illinois at Urbana-Champaign
AbstractByte addressable, non-volatile memory (NVM) is emerging as a revolutionary technology that provides near-DRAM performance and scalable memory capacity. To facilitate its usability, many NVM programming models have been proposed. However, most of them require programmers to explicitly specify the data structures or objects that should reside in NVM. Such a limitation inevitably increases the burden on programmers, complicates development, and further introduces correctness and performance bugs. To rectify this situation, we propose a new user-friendly Java NVM framework. Because our model is defined at a high level, it is intuitive, not prone to user bugs, and is flexible enough to allow language implementers to perform many optimizations while still adhering to its requirements. We implement a persistent version of memcached in our framework and find that its performance exceeds existing Java offerings and requires minimal program modifications.
Speaker bioThomas Shull is a PhD student at the University of Illinois at Urbana-Champaign, working under the guidance of Josep Torrellas. His research interests include the implementation of managed languages, architecture, and emerging memory technologies. Currently, he is working to develop a new NVM programming model and implementation for managed languages which achieves the trinity of programmability, safety, and performance.
Redesigning LSMs for Nonvolatile Memory with NoveLSM
SudarsunKannan (Rutgers University);NitishBhat (Georgia Tech);AdaGavrilovska (Georgia Tech);AndreaArpaci-Dusseau (University of Wisconsin Madison);RemziArpaci-Dusseau (University of Wisconsin Madison);
Redesigning LSMs for Nonvolatile Memory with NoveLSM
SudarsunKannan (Rutgers University);NitishBhat (Georgia Tech);AdaGavrilovska (Georgia Tech);AndreaArpaci-Dusseau (University of Wisconsin Madison);RemziArpaci-Dusseau (University of Wisconsin Madison);
Speaker:Sudarsun Kannan, Rutgers University
AbstractWe present NoveLSM, a persistent LSM-based key-value storage system designed to exploit non-volatile memories and deliver low latency and high throughput to applications. We utilize three key techniques – a byte-addressable skip list, direct mutability of persistent state, and opportunistic read parallelism – to deliver high performance across a range of workload scenarios. Our analysis with popular benchmarks and real-world workload reveal up to a 3.8x and 2x reduction in write and read access latency compared to LevelDB. Storing all the data in a persistent skip list and avoiding block I/O provides more than 5x and 1.9x higher write throughput over LevelDB and RocksDB. Recovery time improves substantially with NoveLSM’s persistent skip list.
Speaker bioSudarsun Kannan is an Assistant Professor at Rutgers University's Computer Science Department with a research focus on Operating Systems. More specifically, he works on problems relating to heterogeneous resource (memory, storage, and compute) management challenges and understanding their impact on large-scale applications. Before joining Rutgers, Sudarsun was a postdoc at the University of Wisconsin-Madison's Computer Science Department and graduated from the College of Computing, Georgia Tech. His thesis explored methods to support hardware heterogeneity in Operating Systems.
FAST and FAIR B+-Tree for Byte-Addressable Persistent Memory
Wook-HeeKim (UNIST);DeukyeonHwang (UNIST);JonghyeonYoo (Sungkyunkwan University);YoujipWon (KAIST);BeomseokNam (Sungkyunkwan University);
FAST and FAIR B+-Tree for Byte-Addressable Persistent Memory
Wook-HeeKim (UNIST);DeukyeonHwang (UNIST);JonghyeonYoo (Sungkyunkwan University);YoujipWon (KAIST);BeomseokNam (Sungkyunkwan University);
Speaker:Wook-Hee Kim, UNIST (Ulsan National Institute of Science and Technology)
AbstractThis extended abstract presents our FAST and FAIR B+-tree that redesigns insertion, deletion, rebalancing, and search algorithms such that tree structures can be modified in a failure-atomic fashion via a series of store and clflush instructions. We also present the performance of legacy binary T-tree that we modified for byte-addressable persistent memory. Our experimental results show the performance of T-tree is comparable to other state-of-the-art persistent indexes although its implementation is much simpler.
Speaker bioWook-Hee Kim is a postdoctoral researcher in the College of Software at Sungkyunkwan University. He received his Ph.D. and B.S. degree from Ulsan National Institute of Science and Technology (UNIST) in 2019 and 2013 respectively. His research interests include system software for non-volatile memories.
3:40 pm – 4:40 pm | Price Center West Ballroom
Session 14: Transactions
Chair: Andy Rudoff
A Persistent Lock-Free Queue for Non-Volatile Memory
MichalFriedman (Technion, Israel);MauriceHerlihy (Brown University, USA);VirendraMarathe (Oracle Labs, USA);ErezPetrank (Technion, Israel);
A Persistent Lock-Free Queue for Non-Volatile Memory
MichalFriedman (Technion, Israel);MauriceHerlihy (Brown University, USA);VirendraMarathe (Oracle Labs, USA);ErezPetrank (Technion, Israel);
Speaker:Michal Friedman, Technion, Israel
AbstractThis paper proposes three novel implementations of a concurrent lock-free queue for non-volatile byte addressable memory. These implementations illustrate algorithmic challenges in building persistent lock-free data structures with different levels of durability guarantees. In presenting these challenges, the proposed algorithmic designs, and the different durability guarantees, we hope to shed light on ways to build a wide variety of durable data structures.
Speaker bioMichal Friedman is a PhD student in the Technion, Israel, and an Azrieli scholar. Her dissertation spans over theory and practice of concurrent computation and software for non-volatile memories. Her advisor is Prof. Erez Petrank. She received a B.Sc. from the Technion summa cum laude, and she received several Technion teaching awards for her teaching of undergraduate courses.
iDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory
QingruiLiu (Virginia Tech);JosephIzraelevitz (UC San Diego);Se KwonLee (UNIST);Michael L.Scott (University of Rochester);Sam H.Noh (UNIST);ChangheeJung (Virginia Tech);
iDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory
QingruiLiu (Virginia Tech);JosephIzraelevitz (UC San Diego);Se KwonLee (UNIST);Michael L.Scott (University of Rochester);Sam H.Noh (UNIST);ChangheeJung (Virginia Tech);
Speaker:Michael L. Scott, University of Rochester
AbstractThis paper presents iDO, a compiler-directed approach to failure atomicity with nonvolatile memory. Unlike most prior work, which instrument each store of persistent data for redo or undo logging, the iDO compiler identifies idempotent instruction sequences, whose re-execution is guaranteed to be side-effect-free, thereby eliminating the need to log every persistent store. Using an extension of our prior work on JUSTDO logging, the compiler then arranges, during recovery from failure, to back up each thread to the beginning of the current idempotent region and re-execute to the end of the current failure-atomic section. This extension transforms JUSTDO logging from a technique of value only on hypothetical future machines with nonvolatile caches into a technique that also significantly outperforms state-of-the art lock-based persistence mechanisms on current hardware during normal execution, while preserving very fast recovery times.
Speaker bioMichael L. Scott is the Arthur Gould Yates Professor of Engineering and past Chair of the Department of Computer Science at the University of Rochester. During the 2014-2015 academic year he was a Visiting Scientist at Google. He received his Ph.D. from the University of Wisconsin-Madison in 1985. His research interests span operating systems, languages, architecture, and tools, with a particular emphasis on parallel and distributed systems. He is best known for work in synchronization algorithms and concurrent data structures, in recognition of which he shared the 2006 SIGACT/SIGOPS Edsger W. Dijkstra Prize. His textbook on programming language design and implementation (_Programming Language Pragmatics_, fourth edition, Morgan Kaufmann, Nov. 2015) and his monograph on _Shared Memory Synchronization_ (Morgan & Claypool, 2013) are standard references in the field. He has served as General Chair of SOSP and as Program Chair of ASPLOS, PPoPP, and TRANSACT. He was named a Fellow of the ACM in 2006 and of the IEEE in 2010. In 2001 he received the University of Rochester's Robert and Pamela Goergen Award for Distinguished Achievement and Artistry in Undergraduate Teaching; in 2018 he received the Hajim School of Engineering and Applied Sciences Lifetime Achievement Award.
Strand Persistency
VaibhavGogte (University of Michigan);WilliamWang (ARM);StephanDiestelhorst (ARM);Peter M.Chen (University of Michigan);SatishNarayanasamy (University of Michigan);Thomas F.Wenisch (University of Michigan);
Strand Persistency
VaibhavGogte (University of Michigan);WilliamWang (ARM);StephanDiestelhorst (ARM);Peter M.Chen (University of Michigan);SatishNarayanasamy (University of Michigan);Thomas F.Wenisch (University of Michigan);
Speaker:Vaibhav Gogte, Graduate Student Research Assistant
AbstractNascent persistent memory (PM) technologies promise the performance of DRAM with the durability of disk. Several language-level persistency models have emerged recently to aid programming recoverable data structures in PM. Unfortunately, these persistency models are built upon hardware primitives that impose stricter ordering constraints on PM operations than these persistency models require. Alternative solutions use inflexible hardware logging techniques to relax ordering constraints on PM operations, but do not readily apply to general synchronization primitives employed by language-level persistency models. We propose employing strand persistency to minimally constrain orderings on PM operations. Using strand persistency semantics, we construct the runtime logging mechanisms required by state-of-the-art language-level persistency models. We demonstrate how strand persistency can enable greater concurrency of PM operations than existing ISA-level ordering mechanisms, improving performance by up to 34.5% (21.4% avg.).
Speaker bioVaibhav is a fifth-year PhD candidate in the Computer Science Department at University of Michigan. He is advised by Prof. Thomas Wenisch. He is currently working on the architecture support for integrating non-volatile memories into future computing systems. Prior to that, he has worked on custom hardware accelerators for processing unstructured data at memory bandwidth available in modern systems.