SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference

Journal of Parallel and Distributed Computing - Tập 173 - Trang 140-151 - 2023
Jude Haris1, Perry Gibson1, José Cano1, Nicolas Bohm Agostini2, David Kaeli2
1University of Glasgow UK
2Northeastern University USA

Tài liệu tham khảo

Abadi, 2016, TensorFlow: a system for large-scale machine learning, 265 Agostini, 2020, Design space exploration of accelerators and end-to-end DNN evaluation with TFLITE-SOC, 10 Alwani, 2016, Fused-layer CNN accelerators, 1 Chen, 2018, TVM: an automated end-to-end optimizing compiler for deep learning, 579 Chen, 2018, Learning to optimize tensor programs, 3393 Chen, 2019, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices, 292 COOWOO USB Digital Power Meter Corporation developers developers Devlin, 2019, BERT: pre-training of deep bidirectional transformers for language understanding, 4171 Gemmlowp Guan, 2017, An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates, 152 Hadidi, 2019, Characterizing the deployment of deep neural networks on commercial edge devices, 35 Haris, 2021, SECDA: efficient hardware/software co-design of FPGA-based DNN accelerators for edge inference, 33 He, 2016, Deep residual learning for image recognition, 770 Howard, 2017, 1 Huang, 2019, Accelerating sparse deep neural networks on FPGAs, 1 IEEE, 2012 Jouppi, 2017, In-datacenter performance analysis of a tensor processing unit, 1 Krizhevsky, 2012, ImageNet classification with deep convolutional neural networks, 1097 Kwon, 2018, MAERI: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects, 461 Kwon, 2019, Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach, 754 Lan, 2020, A lite bert for self-supervised learning of language representations, 1 Liu, 2017, Throughput-optimized FPGA accelerator for deep convolutional neural networks, 1 Lu, 2017, A high-performance FPGA accelerator for sparse neural networks: work-in-progress, 1 Markidis, 2018, NVIDIA tensor core programmability, performance & precision, 522 Moreau, 2019, 1 Muñoz-Martínez, 2021, STONNE: enabling cycle-level microarchitectural simulation for DNN inference accelerators, 122 Ottavi, 2020, A mixed-precision RISC-V processor for extreme-edge DNN inference, 512 Paszke, 2017, Automatic differentiation in PyTorch, 1 Qin, 2020, SIGMA: a sparse and irregular GEMM accelerator with flexible interconnects for DNN training, 58 Rajpurkar, 2016, Squad: 100,000+ questions for machine comprehension of text, 2383 Russakovsky, 2015, ImageNet large scale visual recognition challenge, 211 Sandler, 2018, MobileNetV2: inverted residuals and linear bottlenecks, 4510 Shao, 2016, Co-designing accelerators and SoC interfaces using gem5-Aladdin, 1 Sun, 2020, Mobilebert: a compact task-agnostic bert for resource-limited devices, 1 Sze, 2017, Efficient processing of deep neural networks: a tutorial and survey, 2295 Szegedy, 2015, Going deeper with convolutions, 1 Szegedy, 2016, Rethinking the inception architecture for computer vision, 2818 Tan, 2019, EfficientNet: rethinking model scaling for convolutional neural networks, 6105 Turner, 2018, Characterising across-stack optimisations for deep convolutional neural networks, 101 Umuroglu, 2017, FINN: a framework for fast, scalable binarized neural network inference, 65 Vaswani, 2017, Attention is all you need, 6000 Wang, 2021, Exploiting parallelism opportunities with deep learning frameworks, 1 Wei, 2017, Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs, 1 Xi, 2020, Smaug: end-to-end full-stack simulation infrastructure for deep learning workloads, 1 Zhang, 2018, DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs, 1 Zhang, 2016, Towards end-to-end speech recognition with deep convolutional neural networks, 410 Zhou, 2017, Incremental network quantization: towards lossless CNNs with low-precision weights, 1