Efficient federated learning on resource-constrained edge devices based on model pruning

Complex & Intelligent Systems - Tập 9 - Trang 6999-7013 - 2023
Tingting Wu1,2,3,4, Chunhe Song1,2,3, Peng Zeng1,2,3
1State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
2Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang, China
3Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China
4University of Chinese Academy of Sciences, Beijing, China

Tóm tắt

Federated learning is an effective solution for edge training, but the limited bandwidth and insufficient computing resources of edge devices restrict its deployment. Different from existing methods that only consider communication efficiency such as quantization and sparsification, this paper proposes an efficient federated training framework based on model pruning to simultaneously address the problem of insufficient computing and communication resources. First, the framework dynamically selects neurons or convolution kernels before the global model release, pruning a current optimal subnet and then issues the compressed model to each client for training. Then, we develop a new parameter aggregation update scheme, which provides training opportunities for global model parameters and maintains the complete model structure through model reconstruction and parameter reuse, reducing the error caused by pruning. Finally, extensive experiments show that our proposed framework achieves superior performance on both IID and non-IID datasets, which reduces upstream and downstream communication while maintaining the accuracy of the global model and reducing client computing costs. For example, with accuracy exceeding the baseline, computation is reduced by 72.27% and memory usage is reduced by 72.17% for MNIST/FC; and computation is reduced by 63.39% and memory usage is reduced by 59.78% for CIFAR10/VGG16.

Tài liệu tham khảo

Diedrichs AL, Bromberg F, Dujovne D, Brun-Laguna K, Watteyne T (2018) Prediction of frost events using machine learning and iot sensing devices. IEEE Internet Things J 5(6):4589–4597 Sezer OB, Dogdu E, Ozbayoglu AM (2017) Context-aware computing, learning, and big data in internet of things: a survey. IEEE Internet Things J 5(1):1–27 Xiao Z, Xu X, Xing H, Song F, Wang X, Zhao B (2021) A federated learning system with enhanced feature extraction for human activity recognition. Knowl-Based Syst 229:107338. https://doi.org/10.1016/j.knosys.2021.107338 Zhu H, Zhang H, Jin Y (2021) From federated learning to federated neural architecture search: a survey. Compl Intell Syst 7(2):639–657 Lin R, Xiao Y, Yang T-J, Zhao D, Xiong L, Motta G, Beaufays F (2022) Federated pruning: improving neural network efficiency with federated learning, vol 2022. Incheon, Korea, Republic of, pp 1701–1705. In: Automatic speech recognition; client devices; deep learning; federated learning; federated pruning; large amounts; network efficiency; neural-networks; recognition models; speech data. https://doi.org/10.21437/Interspeech.2022-10787 McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, PMLR, pp 1273–1282 You Z, Yan K, Ye J, Ma M, Wang P (2019) Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks, vol 32. Vancouver, BC, Canada, p. Baseline models; iterative pruning; pruning algorithms; pruning methods; scaling factors; special operations; state of the art; Taylor expansions Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1529–1538 Liu S, Yu G, Yin R, Yuan J, Shen L, Liu C (2022) Joint model pruning and device selection for communication-efficient federated edge learning. IEEE Trans Commun 70(1):231–244. https://doi.org/10.1109/TCOMM.2021.3124961 Zhou, G., Xu, K., Li, Q., Liu, Y., & Zhao, Y. (2021) AdaptCL: Efficient Collaborative Learning with Dynamic and Adaptive Pruning. arXiv preprint arXiv:2106.14126 Jiang Y, Wang S, Valls V, Ko BJ, Lee W-H, Leung KK, Tassiulas L (2022) Model pruning enables efficient federated learning on edge devices. IEEE Trans Neural Netw Learn Syst 2022:1–13. https://doi.org/10.1109/TNNLS.2022.3166101 Lin R, Xiao Y, Yang T-J, Zhao D, Xiong L, Motta G, Beaufays F (2022) Federated pruning: improving neural network efficiency with federated learning. arXiv:2209.06359 Hanson S, Pratt L (1988) Comparing biases for minimal network construction with back-propagation. Adv Neural Inf Process Syst 1:177–185 Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, San Juan, Puerto rico. In: Complex neural networks; compression methods; dram memory; hardware resources; Huffman coding; loss of accuracy; mobile applications; storage requirements Li H, Samet H, Kadav A, Durdanovic I, Graf HP (2017) Pruning filters for efficient convnets. Toulon, France Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 806–814 Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744 He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397 Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference, Toulon, France. In: Classification tasks; computationally efficient; convolutional kernel; gradient informations; Kernel weight; network parameters; resource-efficient; Taylor expansions Lee N, Ajanthan T, Torr PHS (2019) Snip: Single-shot network pruning based on connection sensitivity, New Orleans, LA, United states. In: Classification tasks; hyperparameters; iterative optimization; network pruning; new approaches; recurrent networks; reference network; sparse network He Y, Liu P, Wang Z, Hu Z, Yang Y (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4340–4349 Guo S, Wang Y, Li Q, Yan J (2020) Dmcp: differentiable markov channel pruning for neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1539–1547 Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neural Inf Process Syst 28:25 He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y (2019) Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Trans Cybern 50(8):3594–3604 Shen S, Li R, Zhao Z, Zhang H, Zhou Y (2021) Learning to prune in training via dynamic channel propagation. In: 2020 25th international conference on pattern recognition (ICPR), pp 939–945, IEEE Tonellotto N, Gotta A, Nardini FM, Gadler D, Silvestri F (2021) Neural network quantization in federated learning at the edge. Inf Sci 575:417–436 Bernstein J, Wang Y-X, Azizzadenesheli K, Anandkumar A (2018) signsgd: compressed optimisation for non-convex problems. In: International conference on machine learning, pp 560–569, PMLR Sattler F, Wiedemann S, Müller K-R, Samek W (2019) Robust and communication-efficient federated learning from non-iid data. IEEE Trans Neural Netw Learn Syst 31(9):3400–3413 Xu J, Du W, Jin Y, He W, Cheng R (2020) Ternary compression for communication-efficient federated learning. IEEE Trans Neural Netw Learn Syst 2020:2598 Wen W, Xu C, Yan F, Wu C, Wang Y, Chen Y, Li H (2017) Terngrad: ternary gradients to reduce communication in distributed deep learning, vol 2017. Long Beach, CA, United states, pp 1510–1520. In: Accuracy loss; communication time; data parallelism; layer-wise; network communications; performance model; source codes Zhang H, Li J, Kara K, Alistarh D, Liu J, Zhang C (2017) Zipml: training linear models with end-to-end low precision, and a little bit of deep learning. In: International conference on machine learning, pp 4035–4043, PMLR Wu J, Huang W, Huang J, Zhang T (2018) Error compensated quantized sgd and its applications to large-scale distributed optimization. In: International conference on machine learning, pp 5325–5333, PMLR Magnússon S, Shokri-Ghadikolaei H, Li N (2020) On maintaining linear convergence of distributed learning and optimization under limited communication. IEEE Trans Signal Process 68:6101–6116 Mishchenko K, Gorbunov E, Takáč M, Richtárik P (2019) Distributed learning with compressed gradient differences. arXiv:1901.09269 Strom N (2015) Scalable distributed dnn training using commodity gpu cloud computing. In: Sixteenth annual conference of the international speech communication association Dryden N, Moon T, Jacobs SA, Van Essen B (2016) Communication quantization for data-parallel training of deep neural networks. In: 2016 2nd workshop on machine learning in HPC environments (MLHPC), pp 1–8, IEEE Aji AF, Heafield K (2017) Sparse communication for distributed gradient descent, Copenhagen, Denmark, pp 440–445. In: Absolute values; convergence rates; gradient descent; machine translations; positively skewed; sparse matrices; speed up; stochastic gradient descent. http://dx.doi.org/10.18653/v1/d17-1045 Chen C-Y, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K (2018) Adacomp: adaptive residual gradient compression for data-parallel distributed training. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 Xing H, Xiao Z, Qu R, Zhu Z, Zhao B (2022) An efficient federated distillation learning system for multitask time series classification. IEEE Trans Instrum Meas 71:1–12. https://doi.org/10.1109/TIM.2022.3201203 Zhuang Z, Tao H, Chen Y, Stojanovic V, Paszke W (2022) An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans Syst Man Cybern Syst 2022:25 Zhou C, Tao H, Chen Y, Stojanovic V, Paszke W (2022) Robust point-to-point iterative learning control for constrained systems: a minimum energy approach. Int J Robust Nonlinear Control 32(18):10139–10161. https://doi.org/10.1002/rnc.6354 Stojanovic V, Nedic N (2016) Joint state and parameter robust estimation of stochastic nonlinear systems. Int J Robust Nonlinear Control 26(14):3058–3074. https://doi.org/10.1002/rnc.3490 Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konečnỳ J, Mazzocchi S, McMahan HB, et al (2019) Towards federated learning at scale: system design. arXiv:1902.01046 Liu Z, Sun M, Zhou T, Huang G, Darrell T (2019) Rethinking the value of network pruning, New Orleans, LA, United states. In: Large models; learning rates; low-resource settings; network pruning; parameterized model; pruning algorithms; pruning methods; state of the art Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks, Barcelona, Spain, pp 2082–2090. In: Classification accuracy; compact structures; computation costs; computation resources; high demand; resource constrained devices; structured sparsities Wen D, Jeon K-J, Huang K (2022) Federated dropout a simple approach for enabling federated learning on resource constrained devices. IEEE Wirel Commun Lett 11(5):923–927. https://doi.org/10.1109/LWC.2022.3149783