SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup

Integration - Tập 69 - Trang 356-368 - 2019
Xizi Chen1, Jingbo Jiang1, Jingyang Zhu1, Chi-Ying Tsui1
1Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology (HKUST), Clear Water Bay, Hong Kong

Tài liệu tham khảo

A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Proc. 25th Int. Conf. Neural Inf. Process. Syst., pp. 10971105. Simonyan, 2015 Ren, 2017, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1137, 10.1109/TPAMI.2016.2577031 Karpathy, 2014, Large-scale video classification with convolutional neural networks, 1725 Wang, 2012, End-to-end text recognition with convolutional neural networks, 3304 Russakovsky, 2015, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 211, 10.1007/s11263-015-0816-y He, 2016, Deep residual learning for image recognition, 770 Lecun, 1998, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278, 10.1109/5.726791 Pfeiffer, 2017, From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots, 1527 Hsu, 2015, Face recognition on drones: issues and limitations, 39 Chen, 2014, DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, SIGARCH Comput. Archit. News, 42, 269, 10.1145/2654822.2541967 Chen, 2014, DaDianNao: a machine-learning supercomputer, 609 Chen, 2016, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks, 367 Nazemi, 2019, Energy-efficient, low-latency realization of neural networks through boolean logic minimization, 274 Chi, 2016, PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, 27 Shafiee, 2016, ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars, 14 Wang, 2017, Classification accuracy improvement for neuromorphic computing systems with one-level precision synapses, 776 Tang, 2017, Binary convolutional neural network on RRAM, 782 Wong, 2012, Metal-oxide RRAM, Proc. IEEE, 100, 1951, 10.1109/JPROC.2012.2190369 Courbariaux, 2016 Rastegari, 2016 Chen, 2018, A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization, 123 Alibart, 2012, High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm, Nanotechnology, 23, 075201, 10.1088/0957-4484/23/7/075201 Feinberg, 2018, Making memristive neural network accelerators reliable, 52 Song, 2017, PipeLayer: a pipelined ReRAM-based accelerator for deep learning, 541 Albericio, 2016, Cnvlutin: ineffectual-neuron-free deep neural network computing, 1 Han, 2016, EIE: efficient inference engine on compressed deep neural network, 243 Jia, 2014, Caffe: convolutional architecture for fast feature embedding, 675 Krizhevsky, 2009 Moons, 2016, Energy-efficient ConvNets through approximate computing, 1 Gysel, 2016 Tann, 2017 Qiu, 2016, Going deeper with embedded fpga platform for convolutional neural network, 26 Courbariaux, 2015 Muralimanohar, 2007, Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0, 3 Jiang, 2014, Verilog-A compact model for oxide-based resistive random access memory (RRAM), 41 L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Braendli, M. Kossel, T. Morf, T. M. Andersen, Y. Leblebici, A 3.1mW 8b 1.2GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32nm digital SOI CMOS, in: IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), pp. 468469. Abadi, 2016 Hasanpour, 2016