Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems
Tóm tắt
Từ khóa
Tài liệu tham khảo
Buffer Manager of PostgreSQL. https://www.interdb.jp/pg/pgsql08.html (2022)
Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, pp. 265–283. USENIX Association (2016)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, pp. 8024–8035 (2019)
Hellerstein, J.M., Ré, C., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., Kumar, A.: The madlib analytics library or MAD skills, the SQL. Proc. VLDB Endow. 5(12), 1700–1711 (2012)
Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-rdbms analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, pp. 325–336 (2012)
Schleich, M., Olteanu, D., Ciucanu, R.: Learning linear regression models over factorized joins. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, pp. 3–18. ACM (2016)
Kumar, A., Naughton, J.F., Patel, J.M.: Learning generalized linear models over normalized data. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1969–1984. ACM (2015)
Olteanu, D., Schleich, M.: F: regression models over factorized views. Proc. VLDB Endow. 9(13), 1573–1576 (2016)
Chen, L., Kumar, A., Naughton, J.F., Patel, J.M.: Towards linear algebra over normalized data. Proc. VLDB Endow. 10(11), 1214–1225 (2017)
Rendle, S.: Scaling factorization machines to relational data. Proc. VLDB Endow. 6(5), 337–348 (2013)
Khamis, M.A., Ngo, H.Q., Nguyen, X., Olteanu, D., Schleich, M.: In-database learning with sparse tensors. In: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 325–340. ACM (2018)
Luo, S., Gao, Z.J., Gubanov, M.N., Perez, L.L., Jankov, D., Jermaine, C.M.: Scalable linear algebra on a relational database system. Commun. ACM 63(8), 93–101 (2020)
Apache MADlib: big data machine learning in SQL. http://madlib.apache.org/
Zhang, C., Ré, C.: Dimmwitted: a study of main-memory statistical analytics. Proc. VLDB Endow. 7(12) (2014)
Kara, K., Eguro, K., Zhang, C., Alonso, G.: Columnml: column-store machine learning with on-the-fly data transformation. Proc. VLDB Endow. 12(4), 348–361 (2018)
Petersen, T.K.: Inside the lustre file system. SEAGATE Technology paper (2015)
Sliding-Window Shuffle in TensorFlow. https://www.tensorflow.org/api_docs/python/tf/data/Dataset
PostgreSQL. https://www.postgresql.org/
Graefe, G.: Volcano—an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)
Xu, L., Qiu, S., Yuan, B., Jiang, J., Renggli, C., Gan, S., Kara, K., Li, G., Liu, J., Wu, W., Ye, J., Zhang, C.: In-database machine learning with corgipile: stochastic gradient descent without full data shuffle. In: SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12–17, 2022, pp. 1286–1300. ACM (2022)
ImageNet Dataset. https://www.image-net.org/
Bottou, L.: Stochastic gradient descent tricks. In: Neural Networks: Tricks of the Trade, pp. 421–436. Springer (2012)
De Sa, C.M.: Random reshuffling is not always better. Adv. Neural Inf. Process. Syst. 33, 5957–5967 (2020)
Yun, C., Sra, S., Jadbabaie, A.: Open problem: Can single-shuffle SGD be better than reshuffling SGD and gd? In: Conference on Learning Theory, COLT 2021, Proceedings of Machine Learning Research, vol. 134, pp. 4653–4658. PMLR (2021)
Jankov, D., Yuan, B., Luo, S., Jermaine, C.: Distributed numerical and machine learning computations via two-phase execution of aggregated join trees. Proc. VLDB Endow. 14(7), 1228–1240 (2021)
Luo, S., Jankov, D., Yuan, B., Jermaine, C.: Automatic optimization of matrix implementations for distributed machine learning and linear algebra. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1222–1234 (2021)
Yuan, B., Jankov, D., Zou, J., Tang, Y., Bourgeois, D., Jermaine, C.: Tensor relational algebra for distributed machine learning system design. Proc. VLDB Endow. 14(8), 1338–1350 (2021)
Jasny, M., Ziegler, T., Kraska, T., Röhm, U., Binnig, C.: DB4ML—an in-memory database kernel with machine learning support. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, pp. 159–173. ACM (2020)
Ruder, S.: An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016)
The optimization algorithms in PyTorch. https://pytorch.org/docs/stable/optim.html
The optimization algorithms in TensorFlow. https://www.tensorflow.org/api_docs/python/tf/keras/optimizers
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)
LIBSVM Data. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
CIFAR-10 Dataset. http://www.cs.toronto.edu/~kriz/cifar.html
Arpaci-Dusseau, R.H., Arpaci-Dusseau, A.C.: Operating Systems: Three Easy Pieces, 1.00 edn. Arpaci-Dusseau Books (2018)
PyTorch’s shuffling strategy. https://pytorch.org/data/main/generated/torchdata.datapipes.iter.Shuffler.html (2024)
Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009)
Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Math. Program. pp. 1–36 (2019)
HaoChen, J.Z., Sra, S.: Random shuffling beats SGD after finite epochs. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019. In: Proceedings of Machine Learning Research, vol. 97, pp. 2624–2633. PMLR (2019)
Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016)
Gürbüzbalaban, M., Ozdaglar, A.E., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Math. Program. 186(1), 49–84 (2021)
Ying, B., Yuan, K., Vlaski, S., Sayed, A.H.: Stochastic learning under random reshuffling with constant step-sizes. IEEE Trans. Signal Process. 67(2), 474–489 (2019)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Liu, J., Zhang, C.: Distributed learning systems with first-order methods. Found. Trends Databases 9(1), 1–100 (2020)
Xu, L., Qiu, S., Yuan, B., Jiang, J., Renggli, C., Gan, S., Kara, K., Li, G., Liu, J., Wu, W., Ye, J., Zhang, C.: Stochastic gradient descent without full data shuffle. CoRR abs/2206.05830 (2022). https://doi.org/10.48550/arXiv.2206.05830
Getting Started With Distributed Data Parallel. https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
Hadoop HDFS Architecture. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs
The Lustre file system. https://www.lustre.org/
ETH Euler Cluster. https://scicomp.ethz.ch/wiki/Euler
Lustre reads/writes data in blocks. https://scicomp.ethz.ch/wiki/Conda
The TFRecord format for storing a sequence of binary records. https://www.tensorflow.org/tutorials/load_data/tfrecord
TFRecord format for PyTorch. https://github.com/vahidk/tfrecord
MADlib-Deep Learning. https://madlib.apache.org/docs/latest/group__grp__keras.html (2024)
Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)
ImageNet training in PyTorch. https://github.com/pytorch/examples/tree/main/imagenet
PostgreSQL TOAST. https://www.postgresql.org/docs/9.5/storage-toast.html
Epsilon Dataset. https://www.k4all.org/project/large-scale-learning-challenge/
Bandwidth Bench. https://github.com/JerryLead/bandwidth-bench (2024)
Best practices on Lustre parallel file systems. https://scicomp.ethz.ch/wiki/Best_practices_on_Lustre_parallel_file_systems
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: 19th International Conference on Computational Statistics, COMPSTAT 2010, pp. 177–186 (2010)
Moulines, E., Bach, F.R.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems, pp. 451–459 (2011)
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Polyak, B.T.: Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 3(4), 643–653 (1963)
Haas, P.J., Koenig, C.: A bi-level Bernoulli scheme for database sampling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004, pp. 275–286. ACM (2004)
Cheng, Y., Zhao, W., Rusu, F.: Bi-level online aggregation on raw data. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27–29, 2017, pp. 10:1–10:12. ACM (2017)
MacGregor, J.: Predictive Analysis with SAP. Galileo Press, Bonn (2013)
Oracle R Enterprise Versions of R Models. https://docs.oracle.com/cd/E11882_01/doc.112/e36761/orelm.htm
The Greenplum MADlib extension. https://greenplum.docs.pivotal.io/6-19/analytics/madlib.html
Fard, A., Le, A., Larionov, G., Dhillon, W., Bear, C.: Vertica-ml: distributed machine learning in vertica database. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, pp. 755–768. ACM (2020)
Google BigQuery ML. https://cloud.google.com/bigquery-ml/docs/introduction
Microsoft SQL Server Machine Learning Services. https://docs.microsoft.com/en-us/sql/machine-learning/sql-server-machine-learning-services?view=sql-server-ver15
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al.: Xgboost: extreme gradient boosting. R Package Version 0.4–2 1(4), 1–4 (2015)
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 3146–3154 (2017)
Meng, X., Bradley, J.K., Yavuz, B., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17, 34:1-34:7 (2016)
Zhang, Z., Jiang, J., Wu, W., Zhang, C., Yu, L., Cui, B.: Mllib*: fast training of glms using spark mllib. In: 35th IEEE International Conference on Data Engineering, ICDE 2019, pp. 1778–1789. IEEE (2019)
Cai, Z., Vagena, Z., Perez, L.L., Arumugam, S., Haas, P.J., Jermaine, C.M.: Simulation of database-valued Markov chains using simsql. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 637–648. ACM (2013)
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., et al.: Large scale distributed deep networks. Adv. Neural Inf. Process. Syst. 25, 1223–1231 (2012)
Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)
Jiang, J., Cui, B., Zhang, C., Yu, L.: Heterogeneity-aware distributed parameter servers. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 463–478 (2017)
Lian, X., Zhang, C., Zhang, H., Hsieh, C.J., Zhang, W., Liu, J.: Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5336–5346 (2017)
Tang, H., Lian, X., Yan, M., Zhang, C., Liu, J.: D$${}^{\text{2}}$$: Decentralized training over decentralized data. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Proceedings of Machine Learning Research, vol. 80, pp. 4855–4863. PMLR (2018)
Zhang, Y., Mcquillan, F., Jayaram, N., Kak, N., Khanna, E., Kislal, O., Valdano, D., Kumar, A.: Distributed deep learning on data systems: a comparative analysis of approaches. Proc. VLDB Endow. 14(10), 1769–1782 (2021)