Deep Network Acceleration with Memristor Crossbars

Abstract: Deep learning accelerators are in vogue and many attempts have been made leveraging near data processing in digital to surpass the performance of GPUs. These algorithms are dominated by matrix-vector multiplications. ISAAC explores the analog option by storing the massive weight matrices in memristor crossbars and performing the dot-product operations in-situ. These crossbars are dense and highly parallel, and help alleviate the cost of accessing memory by performing the multiply-accumulate (MAC) operation in the crossbars itself. However, performing operations in analog comes with its own set of challenges such as area and power hungry Analog to Digital Converters (ADCs) and Digital to Analog Converters (DACs), and long memristor write latencies. ISAAC overcomes these challenges with a pipelined and tiled architecture that spreads a neuron computation across multiple cycles and mutiple crossbars, and uses a data encoding scheme to reduce ADC complexity and handle signed arithmetic. The ISAAC architecture outperforms the digital state-of-the-art accelerator DaDianNao[1] by 15x on recent deep networks. While ISAAC does not address sparsity in the weight matrix, a storage optimized ISAAC chip is able to accommodate large networks with higher area efficiency than accelerators like EIE[4] that are tuned to handle sparsity.

Bio: Ali Shafiee is a fifth-year Ph.D. student in Utah Arch group of the University of Utah. His main research interests are machine learning accelerators, memory system architectures, and hardware security. Ali is graduating soon and he in the job market.