Efficient federated learning on resource-constrained edge devices based on model pruning
Tóm tắt
Federated learning is an effective solution for edge training, but the limited bandwidth and insufficient computing resources of edge devices restrict its deployment. Different from existing methods that only consider communication efficiency such as quantization and sparsification, this paper proposes an efficient federated training framework based on model pruning to simultaneously address the problem of insufficient computing and communication resources. First, the framework dynamically selects neurons or convolution kernels before the global model release, pruning a current optimal subnet and then issues the compressed model to each client for training. Then, we develop a new parameter aggregation update scheme, which provides training opportunities for global model parameters and maintains the complete model structure through model reconstruction and parameter reuse, reducing the error caused by pruning. Finally, extensive experiments show that our proposed framework achieves superior performance on both IID and non-IID datasets, which reduces upstream and downstream communication while maintaining the accuracy of the global model and reducing client computing costs. For example, with accuracy exceeding the baseline, computation is reduced by 72.27% and memory usage is reduced by 72.17% for MNIST/FC; and computation is reduced by 63.39% and memory usage is reduced by 59.78% for CIFAR10/VGG16.
Tài liệu tham khảo
Diedrichs AL, Bromberg F, Dujovne D, Brun-Laguna K, Watteyne T (2018) Prediction of frost events using machine learning and iot sensing devices. IEEE Internet Things J 5(6):4589–4597
Sezer OB, Dogdu E, Ozbayoglu AM (2017) Context-aware computing, learning, and big data in internet of things: a survey. IEEE Internet Things J 5(1):1–27
Xiao Z, Xu X, Xing H, Song F, Wang X, Zhao B (2021) A federated learning system with enhanced feature extraction for human activity recognition. Knowl-Based Syst 229:107338. https://doi.org/10.1016/j.knosys.2021.107338
Zhu H, Zhang H, Jin Y (2021) From federated learning to federated neural architecture search: a survey. Compl Intell Syst 7(2):639–657
Lin R, Xiao Y, Yang T-J, Zhao D, Xiong L, Motta G, Beaufays F (2022) Federated pruning: improving neural network efficiency with federated learning, vol 2022. Incheon, Korea, Republic of, pp 1701–1705. In: Automatic speech recognition; client devices; deep learning; federated learning; federated pruning; large amounts; network efficiency; neural-networks; recognition models; speech data. https://doi.org/10.21437/Interspeech.2022-10787
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, PMLR, pp 1273–1282
You Z, Yan K, Ye J, Ma M, Wang P (2019) Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks, vol 32. Vancouver, BC, Canada, p. Baseline models; iterative pruning; pruning algorithms; pruning methods; scaling factors; special operations; state of the art; Taylor expansions
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1529–1538
Liu S, Yu G, Yin R, Yuan J, Shen L, Liu C (2022) Joint model pruning and device selection for communication-efficient federated edge learning. IEEE Trans Commun 70(1):231–244. https://doi.org/10.1109/TCOMM.2021.3124961
Zhou, G., Xu, K., Li, Q., Liu, Y., & Zhao, Y. (2021) AdaptCL: Efficient Collaborative Learning with Dynamic and Adaptive Pruning. arXiv preprint arXiv:2106.14126
Jiang Y, Wang S, Valls V, Ko BJ, Lee W-H, Leung KK, Tassiulas L (2022) Model pruning enables efficient federated learning on edge devices. IEEE Trans Neural Netw Learn Syst 2022:1–13. https://doi.org/10.1109/TNNLS.2022.3166101
Lin R, Xiao Y, Yang T-J, Zhao D, Xiong L, Motta G, Beaufays F (2022) Federated pruning: improving neural network efficiency with federated learning. arXiv:2209.06359
Hanson S, Pratt L (1988) Comparing biases for minimal network construction with back-propagation. Adv Neural Inf Process Syst 1:177–185
Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, San Juan, Puerto rico. In: Complex neural networks; compression methods; dram memory; hardware resources; Huffman coding; loss of accuracy; mobile applications; storage requirements
Li H, Samet H, Kadav A, Durdanovic I, Graf HP (2017) Pruning filters for efficient convnets. Toulon, France
Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 806–814
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference, Toulon, France. In: Classification tasks; computationally efficient; convolutional kernel; gradient informations; Kernel weight; network parameters; resource-efficient; Taylor expansions
Lee N, Ajanthan T, Torr PHS (2019) Snip: Single-shot network pruning based on connection sensitivity, New Orleans, LA, United states. In: Classification tasks; hyperparameters; iterative optimization; network pruning; new approaches; recurrent networks; reference network; sparse network
He Y, Liu P, Wang Z, Hu Z, Yang Y (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4340–4349
Guo S, Wang Y, Li Q, Yan J (2020) Dmcp: differentiable markov channel pruning for neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1539–1547
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neural Inf Process Syst 28:25
He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y (2019) Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Trans Cybern 50(8):3594–3604
Shen S, Li R, Zhao Z, Zhang H, Zhou Y (2021) Learning to prune in training via dynamic channel propagation. In: 2020 25th international conference on pattern recognition (ICPR), pp 939–945, IEEE
Tonellotto N, Gotta A, Nardini FM, Gadler D, Silvestri F (2021) Neural network quantization in federated learning at the edge. Inf Sci 575:417–436
Bernstein J, Wang Y-X, Azizzadenesheli K, Anandkumar A (2018) signsgd: compressed optimisation for non-convex problems. In: International conference on machine learning, pp 560–569, PMLR
Sattler F, Wiedemann S, Müller K-R, Samek W (2019) Robust and communication-efficient federated learning from non-iid data. IEEE Trans Neural Netw Learn Syst 31(9):3400–3413
Xu J, Du W, Jin Y, He W, Cheng R (2020) Ternary compression for communication-efficient federated learning. IEEE Trans Neural Netw Learn Syst 2020:2598
Wen W, Xu C, Yan F, Wu C, Wang Y, Chen Y, Li H (2017) Terngrad: ternary gradients to reduce communication in distributed deep learning, vol 2017. Long Beach, CA, United states, pp 1510–1520. In: Accuracy loss; communication time; data parallelism; layer-wise; network communications; performance model; source codes
Zhang H, Li J, Kara K, Alistarh D, Liu J, Zhang C (2017) Zipml: training linear models with end-to-end low precision, and a little bit of deep learning. In: International conference on machine learning, pp 4035–4043, PMLR
Wu J, Huang W, Huang J, Zhang T (2018) Error compensated quantized sgd and its applications to large-scale distributed optimization. In: International conference on machine learning, pp 5325–5333, PMLR
Magnússon S, Shokri-Ghadikolaei H, Li N (2020) On maintaining linear convergence of distributed learning and optimization under limited communication. IEEE Trans Signal Process 68:6101–6116
Mishchenko K, Gorbunov E, Takáč M, Richtárik P (2019) Distributed learning with compressed gradient differences. arXiv:1901.09269
Strom N (2015) Scalable distributed dnn training using commodity gpu cloud computing. In: Sixteenth annual conference of the international speech communication association
Dryden N, Moon T, Jacobs SA, Van Essen B (2016) Communication quantization for data-parallel training of deep neural networks. In: 2016 2nd workshop on machine learning in HPC environments (MLHPC), pp 1–8, IEEE
Aji AF, Heafield K (2017) Sparse communication for distributed gradient descent, Copenhagen, Denmark, pp 440–445. In: Absolute values; convergence rates; gradient descent; machine translations; positively skewed; sparse matrices; speed up; stochastic gradient descent. http://dx.doi.org/10.18653/v1/d17-1045
Chen C-Y, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K (2018) Adacomp: adaptive residual gradient compression for data-parallel distributed training. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Xing H, Xiao Z, Qu R, Zhu Z, Zhao B (2022) An efficient federated distillation learning system for multitask time series classification. IEEE Trans Instrum Meas 71:1–12. https://doi.org/10.1109/TIM.2022.3201203
Zhuang Z, Tao H, Chen Y, Stojanovic V, Paszke W (2022) An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans Syst Man Cybern Syst 2022:25
Zhou C, Tao H, Chen Y, Stojanovic V, Paszke W (2022) Robust point-to-point iterative learning control for constrained systems: a minimum energy approach. Int J Robust Nonlinear Control 32(18):10139–10161. https://doi.org/10.1002/rnc.6354
Stojanovic V, Nedic N (2016) Joint state and parameter robust estimation of stochastic nonlinear systems. Int J Robust Nonlinear Control 26(14):3058–3074. https://doi.org/10.1002/rnc.3490
Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konečnỳ J, Mazzocchi S, McMahan HB, et al (2019) Towards federated learning at scale: system design. arXiv:1902.01046
Liu Z, Sun M, Zhou T, Huang G, Darrell T (2019) Rethinking the value of network pruning, New Orleans, LA, United states. In: Large models; learning rates; low-resource settings; network pruning; parameterized model; pruning algorithms; pruning methods; state of the art
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks, Barcelona, Spain, pp 2082–2090. In: Classification accuracy; compact structures; computation costs; computation resources; high demand; resource constrained devices; structured sparsities
Wen D, Jeon K-J, Huang K (2022) Federated dropout a simple approach for enabling federated learning on resource constrained devices. IEEE Wirel Commun Lett 11(5):923–927. https://doi.org/10.1109/LWC.2022.3149783