Computation and memory optimized spectral domain convolutional neural network for throughput and energy-efficient inference

Springer Science and Business Media LLC - Tập 53 - Trang 4499-4523 - 2022
Shahriyar Masud Rizvi1, Ab Al-Hadi Ab Rahman1, Usman Ullah Sheikh1, Kazi Ahmed Asif Fuad2, Hafiz Muhammad Faisal Shehzad3
1VeCAD Research Laboratory, School of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
2School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA
3Department of Computer Science and IT, University of Sargodha, Sargodha, Pakistan

Tóm tắt

Conventional convolutional neural networks (CNNs) present a high computational workload and memory access cost (CMC). Spectral domain CNNs (SpCNNs) offer a computationally efficient approach to compute CNN training and inference. This paper investigates CMC of SpCNNs and its contributing components analytically and then proposes a methodology to optimize CMC, under three strategies, to enhance inference performance. In this methodology, output feature map (OFM) size, OFM depth or both are progressively reduced under an accuracy constraint to compute performance-optimized CNN inference. Before conducting training or testing, it can provide designers guidelines and preliminary insights regarding techniques for optimum performance, least degradation in accuracy and a balanced performance–accuracy trade-off. This methodology was evaluated on MNIST and Fashion MNIST datasets using LeNet-5 and AlexNet architectures. When compared to state-of-the-art SpCNN models, LeNet-5 achieves up to 4.2× (batch inference) and 4.1× (single-image inference) higher throughputs and 10.5× (batch inference) and 4.2× (single-image inference) greater energy efficiency at a maximum loss of 3% in test accuracy. When compared to the baseline model used in this study, AlexNet delivers 11.6× (batch inference) and 5× (single-image inference) increased throughput and 25× (batch inference) and 8.8× (single-image inference) more energy-efficient inference with just 4.4% reduction in accuracy.

Tài liệu tham khảo

Alzubaidi L, Zhang J, Humaidi A J, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M A, Al-Amidie M, Farhan L (2021) Review of deep learning-concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74 Ngo L, Cha J, Han J-H (2020) Deep neural network regression for automated retinal layer segmentation in optical coherence tomography images. IEEE Trans Image Process (TIP) 29:303–312 Xiao Y, Zijie Z (2020) Infrared image extraction algorithm based on adaptive growth immune field. Neural Process Lett 51(3):2575–2587 Yu X, Zhou Z, Gao Q, Li D, Ríha K (2018) Infrared image segmentation using growing immune field and clone threshold. Infrared Phys Technol 88:184–193 Zhu W, Peng B, Wu H, Wang B (2020) Query set centered sparse projection learning for set based image classification. Appl Intell 50(10):3400–3411 Zhu W, Peng Y (2020) Elastic net regularized kernel non-negative matrix factorization algorithm for clustering guided image representation. Appl Soft Comput 97:106774 Otter DW, Medina JR, Kalita JK (2021) A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst (TNNLS) 32(2):604–624 Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37(3):362–386 LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Handwritten digit recognition with a back-propagation network. In: Proceedings of the 2nd international conference on neural information processing systems (NIPS), pp 396–404 LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-Excitation networks. IEEE Trans Pattern Anal Mach Intell (TPAMI) 42(8):2011–2023 Cao C, Wang B, Zhang W, Zeng X, Yan X, Feng Z, Liu Y, Wu Z (2019) An improved faster r-CNN for small object detection, vol 7 Aziz L, Haji Salam MSB, Sheikh UU, Ayub S (2020) Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection: a comprehensive review. IEEE Access 8:170461–170495 Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(4):640–651 Li C, Xia W, Yan Y, Luo B, Tang J (2021) Segmenting objects in day and night: edge-conditioned CNN for thermal image semantic segmentation. IEEE Trans Neural Netw Learn Syst (TNNLS) 32 (7):3069–3082 Kang S, Lee J, Bong K, Kim C, Kim Y, Yoo H-J (2018) Low-power scalable 3-d face frontalization processor for CNN-based face recognition in mobile devices. IEEE J Emerg Sel Top Circuits Syst (JETCAS) 8(4):873–883 Jiang L, Zhang J, Deng B (2020) Robust RGB-d face recognition using attribute-aware loss. IEEE Trans Pattern Anal Mach Intell (TPAMI) 42(10):2552–2566 Khurana K, Deshpande U (2021) Video question-answering techniques, benchmark datasets and evaluation metrics leveraging video captioning: a comprehensive survey. IEEE Access 9:43799–43823 Lin Y, Guo D, Zhang J, Chen Z, Yang B (2021) A unified framework for multilingual speech recognition in air traffic control systems. IEEE Trans Neural Netw Learn Syst (TNNLS) 32(8):3608–3620 Kim T, Lee J, Nam J (2019) Comparison and analysis of sample CNN architectures for audio classification. IEEE J Sel Top Signal Process (JSTSP) 13(2):285–297 Ramisa A, Moreno-Noguer F, Moreno-Noguer K (2018) Breaking news: article annotation by image and text processing. IEEE Trans Pattern Anal Mach Intell (TPAMI) 40(5):1072–1085 Chen L, Lin S, Lu X, Cao D, Wu H, Guo C, Liu C, Wang F. -Y. (2021) Deep neural network based vehicle and pedestrian detection for autonomous driving: a survey. IEEE Trans Intell Transp Syst (TITS) 22(6):3234–3246 Miclea V-C, Nedevschi S (2022) Monocular depth estimation with improved long-range accuracy for UAV environment perception. IEEE Trans Geosci Remote Sens (TGRS) 60:1–15 Dai Z, Yi J, Zhang Y, Zhou B, He L (2020) Fast and accurate cable detection using CNN. Appl Intell 50(12):4688–4707 Esteva A, Kuprel B, Novoa R, Ko J, Swetter S, Blau H, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118 Nayak J, Naik B, Dinesh P, Vakula K, Rao B, Ding W, Pelusi D (2021) Intelligent system for COVID-19 prognosis: a state-of-the-art survey. Appl Intell 51(5):2908–2938 Saraogi E, Chouhan G, Panchal D, Patel M, Gajjar R (2021) CNN Based design rule checker for VLSI layouts. In: Proceedings of the 2nd IEEE international conference on applied electromagnetics, signal processing & communication (AESPC), pp 1–6 Sze V, Chen Y-H, Yang T-J, Emer J (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329 Abtahi T, Shea C, Kulkarni A, Mohsenin T (2018) Accelerating convolutional neural network with FFT on embedded hardware. IEEE Trans Very Large Scale Integr (TVLSI) 26(9):1737–1749 Jain A, Phanishayee A, Mars J, Tang L, Pekhimenko G (2018) Gist: efficient data encoding for deep neural network training. In: Proceedings of the 45th international symposium on computer architecture (ISCA), pp 776–789 Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the 16th IEEE international conference on computer vision (ICCV), pp 2755–2763 Chao P, Kao C-Y, Ruan Y, Huang C-H, Lin Y-L (2019) HarDNet: a low memory traffic network. In: Proceedings of the 17th IEEE/CVF international conference on computer vision (ICCV), pp 3551–3560 Chen Y-H, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits (JSSC) 52(1):127–138 Ma N, Zhang X, Zheng H-T, Sun J (2018) Shuffle Net v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the 15th European conference on computer vision (ECCV), pp 116–131 Vaze S, Xie W (2020) Namburete, A.I.L.e.: low-memory CNNs enabling real-time ultrasound segmentation towards mobile deployments. IEEE J Biomed Health Inform (JBHI) 24(4):1059–1069 Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through FFTs. In: Proceedings of the 2nd international conference on learning representations (ICLR) Vasilache N, Johnson J, Mathieu M, Chintala S, Piantino S, LeCun Y (2015) Fast convolutional nets with fbfft: a GPU performance evaluation. In: Proceedings of the 3rd international conference on learning representations (ICLR) Rippel O, Snoek J, Adams R (2015) Spectral representations for convolutional neural networks. In: Proceedings of the 28th international conference on neural information processing systems (NIPS), pp 2449–2457 Ko J, Mudassar B, Na T, Mukhopadhyay S (2017) Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation. In: Proceedings of the 54th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6 Niu Y, Zeng H, Srivastava A, Lakhotia K, Kannan R, Wang Y, Prasanna V (2019) SPEC2: SPECtral SParsE CNN accelerator on FPGAs. In: Proceedings of the 26th IEEE international conference on high performance computing, data, and analytics (HiPC), pp 195–204 Sun W, Zeng H, Yang Y-h, Prasanna V (2018) Throughput-optimized frequency domain CNN with fixed-point quantization on FPGA. In: Proceedings of the 13th international conference on ReConFigurable computing and FPGAs (ReConFig), pp 1–8 Nguyen-Thanh N, Le-Duc H, Ta D-T, Nguyen V-T (2016) Energy efficient techniques using FFT for deep convolutional neural networks. In: Proceedings of the 9th international conference on advanced technologies for communications (ATC), pp 231–236 Lin J, Yao Y (2019) A fast algorithm for convolutional neural networks using tile-based fast fourier transforms. Neural Process Lett 50(2):1951–1967 Rizvi S, Ab Rahman A, Khalil-Hani M, Ayat S (2021) A low-complexity complex-valued activation function for fast and accurate spectral domain convolutional neural network. Indones J Electr Eng Inform (IJEEI) 9(1):173–184 Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 Zhang X, Zhou X, Lin M, Sun J (2018) Shuffle net: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the 31st IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6848–6856 Huang G, Liu S, Maaten L, Weinberger K (2018) Condensenet: an efficient DenseNet using learned group convolutions. In: Proceedings of the 31st IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2752–2761 Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS), pp 8024–8035 Tensor Flow Lite: ML for mobile and edge devices. https://www.tensorflow.org/lite/. Accessed 1 Nov 2021 Gibson J, Pand Cano, Turner J, Crowley E, O’Boyle M, Storkey A (2020) Optimizing grouped convolutions on edge devices. In: Proceedings of the 31st international conference on application-specific systems, architectures and processors (ASAP), pp 189–196 Ayat S, Khalil-Hani M, Ab Rahman A, Abdellatef H (2019) Spectral-based convolutional neural network without multiple spatial-frequency domain switchings. Neurocomputing 364:152–167 Watanabe T, Wolf D (2021) Image classification in frequency domain with 2SReLU: a second harmonics superposition activation function. Appl Soft Comput 112:107851–107851 Liu S, Luk W (2020) Optimizing fully spectral convolutional neural networks on FPGA. In: Proceedings of the 19th IEEE international conference on field-programmable technology (ICFPT), pp 39–47 Guan B, Zhang J, Sethares W, Kijowski R, Liu F (2021) Spectral domain convolutional neural network. In: Proceedings of the 46th IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2795–2799 Abdelouahab K, Pelcat M, Berry F (2020) Accelerating the CNN inference on FPGAs. In: Fagerberg J, Mowery DC, Nelson R (eds) Deep learning in computer vision: principles and applications, pp 1–39. Chap 1. CRC Press Taylor & Francis Group, USA Meurant G (1999) Computer solution of large linear systems. Elsevier, Amsterdam. Kala S, Jose B, Paul D, Mathew J (2018) A hardware accelerator for convolutional neural network using fast Fourier transform. In: Proceedings of the 22nd international symposium on vlsi design and test (VDAT), pp 28–36 Sadouk L (2019) CNN Approaches for time series classification. In: Ngan C-K (ed) Time series analysis - data, methods, and applications, pp 57–79. Chap 4. IntechOpen, London Wang E, Davis J, Zhao R, Ng H-C, Niu X, Luk W, Cheung P, Constantinides G (2019) Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput Surv 52(2):1–39 Vedaldi A, Lux M, Bertini M (2018) Matconvnet: CNNs are also for MATLAB users. ACM SIGMultimedia Records 10(1):9–9 LeCun Y, Cortes C (2010) MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. Accessed 21 Oct 2021 Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747