Investigating Automatic Parameter Tuning for SQL-on-Hadoop Systems

Big Data Research - Tập 25 - Trang 100204 - 2021
Edson Ramiro Lucas Filho1, Eduardo Cunha de Almeida2, Stefanie Scherzinger1, Herodotos Herodotou3
1University of Passau, Germany
2Federal University of Paraná, Brazil
3Cyprus University of Technology, Cyprus

Tài liệu tham khảo

Afrati, 2016, Assignment problems of different-sized inputs in MapReduce, ACM Trans. Knowl. Discov. Data, 11 Armbrust, 2015, Spark SQL: relational data processing in Spark, 1383 Babu, 2010, Towards automatic optimization of MapReduce programs, 137 Bao, 2018, Learning-based automatic parameter tuning for big data analytics frameworks, 181 Bei, 2017, MEST: a model-driven efficient searching approach for MapReduce self-tuning, IEEE Access, 5, 3580, 10.1109/ACCESS.2017.2672675 Bei, 2016, RFHOC: a random-forest approach to auto-tuning Hadoop's configuration, IEEE Trans. Parallel Distrib. Syst., 27, 1470, 10.1109/TPDS.2015.2449299 Cai, 2017, A recommendation-based parameter tuning approach for Hadoop, 223 Chen, 2015, Machine learning-based configuration parameter tuning on Hadoop system, 386 Chen, 2014, A study of SQL-on-Hadoop systems, 154 Cherkasova, 2011, Performance modeling in MapReduce environments: challenges and opportunities, 5 Chiba, 2018, Towards selecting best combination of SQL-on-Hadoop systems and JVMs, 245 Deshpande, 2018, Automatic tuning of SQL-on-Hadoop engines on cloud platforms, 508 Ding, 2015, JellyFish: online performance tuning with adaptive configuration and elastic container in Hadoop Yarn, 831 Ead, 2014, PStorM: profile storage and matching for feedback-based tuning of MapReduce jobs, 1 Filho, 2019, Don't tune twice: reusing tuning setups for SQL-on-Hadoop queries, 93 Floratou, 2014, SQL-on-Hadoop: full circle back to shared-nothing database architectures, Proc. VLDB Endow., 7, 1295, 10.14778/2732977.2733002 Glushkova, 2019, MapReduce performance model for Hadoop 2.x, Inf. Syst., 79, 32, 10.1016/j.is.2017.11.006 Herodotou, 2011, Profiling, what-if analysis, and cost-based optimization of MapReduce programs, Proc. VLDB Endow., 4, 1111, 10.14778/3402707.3402746 Herodotou, 2013, A what-if engine for cost-based MapReduce optimization, IEEE Data Eng. Bull., 36, 5 Herodotou, 2020, A survey on automatic parameter tuning for big data processing systems, ACM Comput. Surv., 53, 1, 10.1145/3381027 Herodotou, 2011, Starfish: a self-tuning system for big data analytics, 261 Heudecker, 2015 Huai, 2014, Major technical advancements in Apache Hive, 1235 Jain, 2017, Analyzing & optimizing Hadoop performance, 116 Jiang, 2010, The performance of MapReduce: an in-depth study, Proc. VLDB Endow., 3, 472, 10.14778/1920841.1920903 Khaleel, 2018, Optimization of computing and networking resources of a Hadoop cluster based on software defined network, IEEE Access, 10.1109/ACCESS.2018.2876385 Khan Kornacker, 2015, Impala: a modern, open-source SQL engine for Hadoop, 9 Kumar, 2017, Scalable performance tuning of Hadoop MapReduce: a noisy gradient approach, 375 Lee, 2016, Hadoop performance self-tuning using a fuzzy-prediction approach, 55 Lee, 2011, YSmart: yet another SQL-to-MapReduce translator, 25 Li, 2014, An adaptive auto-configuration tool for Hadoop, 69 Li, 2014, MRONLINE: MapReduce online performance tuning, 165 Liao, 2013, Gunther: search-based auto-tuning of MapReduce, 406 Lim, 2012, Stubby: a transformation-based optimizer for MapReduce workflows, Proc. VLDB Endow., 5, 1196, 10.14778/2350229.2350239 Liu, 2015, MR-COF: a genetic MapReduce configuration optimization framework, 344 Liu, 2012, Panacea: towards holistic optimization of MapReduce applications, 33 Mahgoub, 2020, OPTIMUSCLOUD: heterogeneous configuration optimization for distributed databases in the cloud, 189 Miner, 2012 Nykiel, 2010, MRShare: sharing across multiple queries in MapReduce, Proc. VLDB Endow., 3, 494, 10.14778/1920841.1920906 Poggi, 2016, The state of SQL-on-Hadoop in the cloud, 1432 Rajaraman, 2011 Sarma, 2013, Upper and lower bounds on the cost of a Map-Reduce computation, Proc. VLDB Endow., 6, 10.14778/2535570.2488334 Shi, 2014, MRTuner: a toolkit to enable holistic optimization for MapReduce jobs, Proc. VLDB Endow., 7, 1319, 10.14778/2733004.2733005 Shvachko, 2010, The Hadoop distributed file system, 1 Singhal, 2017, Performance assurance model for applications on SPARK platform, 131 Song, 2013, A Hadoop MapReduce performance prediction method, 820 The Apache Software Foundation Thusoo, 2009, Hive: a warehousing solution over a Map-Reduce framework, Proc. VLDB Endow., 2, 1626, 10.14778/1687553.1687609 Van Aken, 2017, Automatic database management system tuning through large-scale machine learning, 1009 Wang, 2016, A novel method for tuning configuration parameters of Spark based on machine learning, 586 Wang, 2012, Predator — an experience guided configuration optimizer for Hadoop MapReduce, 419 Wu, 2013, A self-tuning system based on application profiling and performance analysis for optimizing Hadoop MapReduce cluster configuration, 89 Xin, 2013, Shark: SQL and rich analytics at scale, 13 Yigitbasi, 2013, Towards machine learning-based auto-tuning of mapreduce, 11 Zhang, 2016, Self-balancing job parallelism and throughput in Hadoop, 129 Zhang, 2013, AutoTune: optimizing execution concurrency and resource usage in MapReduce workflows, 175 Zhang, 2013, Benchmarking approach for designing a MapReduce performance model, 253