Fast-Convergent Federated Learning

IEEE Journal on Selected Areas in Communications - Tập 39 Số 1 - Trang 201-218 - 2021
Hung T. Nguyen1, Vikash Sehwag1, Seyyedali Hosseinalipour2, Christopher G. Brinton2, Mung Chiang2, H. Vincent Poor1
1Department of Electrical Engineering, Princeton University, Princeton, NJ, USA
2School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA

Tóm tắt

Federated learning has emerged recently as a promising solution for distributing machine learning tasks through modern networks of mobile devices. Recent studies have obtained lower bounds on the expected decrease in model loss that is achieved through each round of federated learning. However, convergence generally requires a large number of communication rounds, which induces delay in model training and is costly in terms of network resources. In this paper, we propose a fast-convergent federated learning algorithm, called $\mathsf {FOLB}$ , which performs intelligent sampling of devices in each round of model training to optimize the expected convergence speed. We first theoretically characterize a lower bound on improvement that can be obtained in each round if devices are selected according to the expected improvement their local models will provide to the current global model. Then, we show that $\mathsf {FOLB}$ obtains this bound through uniform sampling by weighting device updates according to their gradient information. $\mathsf {FOLB}$ is able to handle both communication and computation heterogeneity of devices by adapting the aggregations according to estimates of device’s capabilities of contributing to the updates. We evaluate $\mathsf {FOLB}$ in comparison with existing federated learning algorithms and experimentally show its improvement in trained model accuracy, convergence speed, and/or model stability across various machine learning tasks and datasets.

Từ khóa

#Federated learning #distributed optimization #fast convergence rate

Tài liệu tham khảo

khan, 2019, Federated learning for edge networks: Resource optimization and incentive mechanism, arXiv 1911 05642

10.1109/COMST.2020.2970550

jeon, 2020, A compressive sensing approach for federated learning over massive MIMO communication systems, arXiv 2003 08059

li, 2019, Fair resource allocation in federated learning, arXiv 1905 10497

liu, 2019, Enhancing the privacy of federated learning with sketching, arXiv 1911 01812

ghazi, 2019, Scalable and differentially private distributed aggregation in the shuffled model, arXiv 1906 08320

10.1109/ICASSP40776.2020.9054168

chen, 2020, Convergence time optimization for federated learning over wireless networks, arXiv 2001 07845

chen, 2019, A joint learning and communications framework for federated learning over wireless networks, arXiv 1909 07972

mohammadi amiri, 2020, Convergence of update aware device scheduling for federated learning at the wireless edge, arXiv 2001 10402

zhang, 2015, Deep learning with elastic averaging SGD, Proc Adv Neural Inf Process Syst, 685

hosseinalipour, 2020, From federated to fog learning: Distributed machine learning over heterogeneous wireless networks, arXiv 2006 03594

10.1109/CDC.2012.6426691

yu, 2018, Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning, arXiv 1807 06629

jiang, 2018, A linear speedup analysis of distributed deep learning with sparse and quantized communication, Proc Adv Neural Inf Process Syst, 2525

yu, 2019, On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization, arXiv 1905 03817

10.1109/JSAC.2019.2904348

lin, 2018, Don’t use large mini-batches, use local SGD, arXiv 1808 07217

stich, 2018, Local SGD converges fast and communicates little, arXiv 1805 09767

wang, 2018, Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms, arXiv 1808 07576

woodworth, 0, Graph oracle models, lower bounds, and gaps for parallel stochastic optimization, Proc Adv Neural Inf Process Syst, 2018, 8496

bhowmick, 2018, Protection against reconstruction and its applications in private federated learning, arXiv 1812 00984

dekel, 2012, Optimal distributed online prediction using mini-batches, J Mach Learn Res, 13, 165

10.1145/3133956.3133982

dean, 2012, Large scale distributed deep networks, Proc Adv Neural Inf Process Syst, 1223

reddi, 2016, AIDE: Fast and communication efficient distributed optimization, arXiv 1608 06879

agarwal, 0, CpSGD: Communication-efficient and differentially-private distributed SGD, Proc Adv Neural Inf Process Syst, 2018, 7564

li, 2014, Scaling distributed machine learning with the parameter server, Proc Int Conf Big Data Sci Comput, 19

10.1109/ISIT44484.2020.9174216

richtárik, 2016, Distributed coordinate descent method for learning with big data, J Mach Learn Res, 17, 2657

10.1561/2200000016

smith, 2016, CoCoA: A general framework for communication-efficient distributed optimization, arXiv 1611 02189

10.1109/JIOT.2016.2584538

10.1109/IJCNN.2017.7966217

mcmahan, 2017, Communication-efficient learning of deep networks from decentralized data, Proc 20th Int Conf Artif Intell Statist, 1273

10.1109/5.726791

abadi, 2016, Tensorflow: A system for large-scale machine learning, Proc 12th USENIX Symp Operating Syst Des Implementation, 265

dinh, 2019, Federated learning over wireless networks: Convergence analysis and resource allocation, arXiv 1910 13067

go, 2009, Twitter sentiment classification using distant supervision, CS224N Project Report, 1, 15

li, 2018, Federated optimization in heterogeneous networks, arXiv 1812 06127

li, 2019, Federated learning: Challenges, methods, and future directions, arXiv 1908 07873

praneeth karimireddy, 2019, SCAFFOLD: Stochastic controlled averaging for federated learning, arXiv 1910 06378

kairouz, 2019, Advances and open problems in federated learning, arXiv 1912 04977

reddi, 2020, Adaptive federated optimization, arXiv 2003 00295

bishop, 2006, Pattern Recognition and Machine Learning

smith, 2017, Federated multi-task learning, Proc Adv Neural Inf Process Syst, 4424

10.1109/INFOCOM41043.2020.9155372

li, 2019, On the convergence of FedAvg on non-IID data, arXiv 1907 02189