Analytical Performance Models for MapReduce Workloads

Emanuel Vianna1, Giovanni Comarela1, Tatiana Barcelos Pontes1, Jussara M. Almeida1, Virgílio A. F. Almeida1, Kevin Wilkinson2, Harumi Kuno2, Umeshwar Dayal2
1Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
2Information Analytics Lab, HP Laboratories (HP-Labs), Palo Alto, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Apache Software Foundation, Powered by Hadoop. URL http://wiki.apache.org/hadoop/PoweredBy . Access date: 1 July 2012 (2012)

Apache Software Foundation, Official Apache Hadoop Website. URL http://hadoop.apache.org/ . Accessed date: 1 July 2012 (2012)

Berlińska J., Drozdowski M.: Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71(3), 450–459 (2011)

Chen, Y., Ganapathi, A., Griffith R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Singapore, pp. 390–399 (2011)

Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears R.: MapReduce Online Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI), San Jose, California, pp. 21–35 (2010)

Dean, J., Ghemawat, S., MapReduce : Simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation (OSDI), San Francisco, California, pp. 137–150 (2004)

Dean J., Ghemawat S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

Ganapathi, A.: Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning. Technical Report UCB/EECS-2009-181. EECS Department, University of California, Berkeley (2009)

Ganapathi, A., Kuno, H., Dayal, U., Wiener, J., Fox, A., Jordan, M., Patterson, D. : Predicting multiple metrics for queries: better decisions enabled by machine learning. In: Proceedings of the 2009 IEEE International Conference on Data Engineering (ICDE), Shanghai, China, pp. 592–603 (2009)

Herodotou, H.: Hadoop Performance Models. Technical Report CS-2011-05. Computer Science Department, Duke University. URL http://arxiv.org/abs/1106.0940 (2011)

Jain R.: The Art of Computer Systems Performance Analysis—Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley, London (1991)

Jiang D.R., Ooi B.C., Shi L., Wu S.: The performance of MapReduce: an in-depth study. Proc. VLDB Endow 3(1–2), 472–483 (2010)

Jonkers H.: Queueing Models of Parallel Applications: The Glamis Methodology, Computer Performance Evaluation: Modeling Techniques & Tools (LNCS 794), pp. 123–138. Springer, Berlin (1994)

Kim S., Won J., Han H., Eom H., Yeom H.Y.: Improving hadoop performance in intercloud environments ACM SIGMETRICS. Perform. Eval. Rev. 39(3), 107–109 (2011)

Krevat, E., Shiran, T., Anderson, E., Tucek, J., Wylie, J.J. , Ganger, G.R.: Applying Performance Models to Understand Data-intensive Computing Efficiency. Technical Report CMU-PDL-10-108. Carnegie Mellon University, Pittsburgh (2010)

Kruskal C.P., Weiss A.: Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng. 11(10), 1001–1016 (1985)

Lavenberg S., Reiser M.: Stationary state probabilities at arrival instants for closed queueing networks with multiple types of customers. J. Appl. Probab. 17(4), 1048–1061 (1980)

Lee K.H., Lee Y.J., Choi H., Chung Y.D., Moon B.: Parallel data processing with MapReduce: a survey. ACM SIGMOD Record J. 40(4), 11–20 (2011)

Liang D.R., Tripathi S. K.: On performance prediction of parallel computations with precedent constraints. IEEE Trans. Parallel Distrib. Syst. 11(5), 491–508 (2000)

Little J.: A proof for the queuing formula: L = λ W. Oper. Res. 9(3), 383–387 (1961)

Mak V.W., Lundstrom S.F.: Predicting performance of parallel computations. IEEE Trans. Parallel Distrib. Syst. 1(3), 257–260 (1990)

Menasce, D., Dowdy, L., Almeida, V.: Performance by Design: Computer Capacity Planning By Example. Prentice Hall PTR (2004)

Morton, K., Balazinska, M., Grossman, D.: ParaTimer: a progress indicator for MapReduce DAGs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), Indianapolis, Indiana, pp. 507–518 (2010)

Pavlo, A., Paulson, E., Rasin, A., Abadi, D., DeWitt, D., Madden, S., Stonebraker, M.: A Comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Providence, Rhode Island, pp. 165–178 (2009)

Reiser M., Lavenberg S.S.: Mean-value analysis of closed multichain queuing networks. J. ACM 27(2), 313–322 (1980)

Salza, S., Lavenberg, S.S.: Approximating response time distributions in closed queueing network models of computer performance. In: Proceedings Performance, North Holland, Amsterdam, pp. 133–145 (1981)

Thomasian A., Bay P.F.: Analytic queueing network models for parallel processing of task systems. IEEE Trans. Comput. 35(12), 1045–1054 (1986)

Trivedi K.S.: Probability and Statistics with Reliability, Queuing and Computer Science Applications. Prentice Hall PTR, Upper Saddle River (1882)

Varki, E.: Mean value technique for closed fork-join networks. In: Proceedings of the 1999 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Atlanta, Georgia, pp. 103–112 (1999)

Vianna, E.,Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Modeling the performance of the Hadoop online prototype. In: Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Vitória, Brazil, pp. 152–159 (2011)

Wang, G., Butt, A.R., Pandey, P., Gupta, K.: A simulation approach to evaluating design decisions in MapReduce setups. In: Proceedings of the IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), Imperial College London, UK, pp. 1–11 (2009)

Wang, G., Butt, A.R., Pandey, P., Gupta, K.: Using realistic simulation for performance analysis of MapReduce setups. In: Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance (LSAP), Munich, Germany, pp. 19–26 (2009)

Weng N., Wolf T.: Analytic modeling of network processors for parallel workload mapping. ACM Trans. Embed. Comput. Syst. 8(3), 18:1–18:29 (2009)

White T.: Hadoop—The Definitive Guide: Storage and Analysis at Internet Scale. 2nd edn. O’Reilly Media, Sebastopol (2011)

Yang, H.C., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on LargeClusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD), Beijing, China, pp. 1029–1040 (2007)

Yang, X., Sun, J.: An Analytical performance model of MapReduce. In: Proceedings of the 2011 IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Beijing, China, pp. 306–310 (2011)

Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI), San Diego, California, pp. 29–42 (2008)

Zahorjan, J.: The Approximate Solution of Large Queueing Network Models, PhD. Thesis, University of Toronto, Canada (1980)