SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference
Tài liệu tham khảo
Abadi, 2016, TensorFlow: a system for large-scale machine learning, 265
Agostini, 2020, Design space exploration of accelerators and end-to-end DNN evaluation with TFLITE-SOC, 10
Alwani, 2016, Fused-layer CNN accelerators, 1
Chen, 2018, TVM: an automated end-to-end optimizing compiler for deep learning, 579
Chen, 2018, Learning to optimize tensor programs, 3393
Chen, 2019, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices, 292
COOWOO USB Digital Power Meter
Corporation
developers
developers
Devlin, 2019, BERT: pre-training of deep bidirectional transformers for language understanding, 4171
Gemmlowp
Guan, 2017, An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates, 152
Hadidi, 2019, Characterizing the deployment of deep neural networks on commercial edge devices, 35
Haris, 2021, SECDA: efficient hardware/software co-design of FPGA-based DNN accelerators for edge inference, 33
He, 2016, Deep residual learning for image recognition, 770
Howard, 2017, 1
Huang, 2019, Accelerating sparse deep neural networks on FPGAs, 1
IEEE, 2012
Jouppi, 2017, In-datacenter performance analysis of a tensor processing unit, 1
Krizhevsky, 2012, ImageNet classification with deep convolutional neural networks, 1097
Kwon, 2018, MAERI: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects, 461
Kwon, 2019, Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach, 754
Lan, 2020, A lite bert for self-supervised learning of language representations, 1
Liu, 2017, Throughput-optimized FPGA accelerator for deep convolutional neural networks, 1
Lu, 2017, A high-performance FPGA accelerator for sparse neural networks: work-in-progress, 1
Markidis, 2018, NVIDIA tensor core programmability, performance & precision, 522
Moreau, 2019, 1
Muñoz-Martínez, 2021, STONNE: enabling cycle-level microarchitectural simulation for DNN inference accelerators, 122
Ottavi, 2020, A mixed-precision RISC-V processor for extreme-edge DNN inference, 512
Paszke, 2017, Automatic differentiation in PyTorch, 1
Qin, 2020, SIGMA: a sparse and irregular GEMM accelerator with flexible interconnects for DNN training, 58
Rajpurkar, 2016, Squad: 100,000+ questions for machine comprehension of text, 2383
Russakovsky, 2015, ImageNet large scale visual recognition challenge, 211
Sandler, 2018, MobileNetV2: inverted residuals and linear bottlenecks, 4510
Shao, 2016, Co-designing accelerators and SoC interfaces using gem5-Aladdin, 1
Sun, 2020, Mobilebert: a compact task-agnostic bert for resource-limited devices, 1
Sze, 2017, Efficient processing of deep neural networks: a tutorial and survey, 2295
Szegedy, 2015, Going deeper with convolutions, 1
Szegedy, 2016, Rethinking the inception architecture for computer vision, 2818
Tan, 2019, EfficientNet: rethinking model scaling for convolutional neural networks, 6105
Turner, 2018, Characterising across-stack optimisations for deep convolutional neural networks, 101
Umuroglu, 2017, FINN: a framework for fast, scalable binarized neural network inference, 65
Vaswani, 2017, Attention is all you need, 6000
Wang, 2021, Exploiting parallelism opportunities with deep learning frameworks, 1
Wei, 2017, Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs, 1
Xi, 2020, Smaug: end-to-end full-stack simulation infrastructure for deep learning workloads, 1
Zhang, 2018, DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs, 1
Zhang, 2016, Towards end-to-end speech recognition with deep convolutional neural networks, 410
Zhou, 2017, Incremental network quantization: towards lossless CNNs with low-precision weights, 1