MapReduce Workload Modeling with Statistical Approach

Springer Science and Business Media LLC - Tập 10 Số 2 - Trang 279-310 - 2012
Hailong Yang1, Zhongzhi Luan1, Wenjun Li1, Depei Qian1
1Sino-German Joint Software Institute, The State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: a Berkeley view of cloud computing. Technical Report No. UCB/EECS-2009–28, Electrical Engineering and Computer Sciences, University of California at Berkeley (2009)

Rimal, B., Jukan, A., Katsaros, D., Goeleven, Y.: Architectural requirements for cloud computing systems: an enterprise cloud approach. Journal of Grid Computing 9(1), 3–26 (2011)

Thusoo, A., Sarma, J.S., Jain, N., Zheng, S., Chakka, P., Ning, Z., Antony, S., Hao, L., Murthy, R.: Hive—a petabyte scale data warehouse using Hadoop. In: Proceedings of IEEE 26th International Conference on Data Engineering (ICDE) (2010)

Yahoo Developer Network: Yahoo! launches world’s largest Hadoop Production Application. Available online at http://developer.yahoo.com/blogs/hadoop/posts/2008/02/yahoo-worlds-largest-production-hadoop/ (2008). Accessed on Nov. 2011

Pallis, G., Katsifodimos, A., Dikaiakos, M.: Searching for software on the EGEE infrastructure. Journal of Grid Computing 8(2), 281–304 (2010)

Thain, D., Moretti, C., Hemmes, J.: Chirp: a practical global filesystem for cluster and Grid computing. Journal of Grid Computing 7(1), 51–72 (2009)

McClatchey, R., Anjum, A., Stockinger, H., Ali, A., Willers, I., Thomas, M.: Data Intensive and Network Aware (DIANA) Grid scheduling. Journal of Grid Computing 5(1), 43–64 (2007)

Yu, C., Marinescu, D.: Algorithms for divisible load scheduling of data-intensive applications. Journal of Grid Computing 8(1), 133–155 (2010)

Cai, Z., Kumar, V., Schwan, K.: IQ-Paths: predictably high performance data streams across dynamic network overlays. Journal of Grid Computing 5(2), 129–150 (2007)

Zaharia, M., Konwinski, A., Joseph, A.D., Randy, H., Katz, I.S.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI) (2008)

Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user MapReduce clusters. Technical Report No. UCB/ EECS-2009–55, Electrical Engineering and Computer Sciences, University of California at Berkeley (2009)

Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating systems Principles (SOSP) (2009)

Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI) (2010)

Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. Proc. VLDB Endow. 3(1–2), 472–483 (2010)

Chen, Y., Ganapathi, A.S., Fox, A., Katz, R.H., Patterson, D.A.: Statistical workloads for energy efficient MapReduce. Technical Report No. UCB/EECS-2010–6, Electrical Engineering and Computer Sciences, University of California at Berkeley (2010)

Apache Hadoop: Gridmix. Available online at http://hadoop.apache.org/mapreduce/docs/current/gridmix.html (2010). Accessed on Nov. 2011

Apache Hive: Hive performance benchmarks. Available online at https://issues.apache.org/jira/browse/HIVE-396 (2010). Accessed on Nov. 2011

Shengsheng, H., Jie, H., Jinquan, D., Tao, X., Bo, H.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of IEEE 26th International Conference on Data Engineering Workshops (ICDEW) (2010)

Apache Hadoop MapReduce: Mumak: Map-Reduce simulator. Available online at https://issues.apache.org/jira/browse/MAPREDUCE-728 (2009). Accessed on Nov. 2011

Wang, G., Butt, A.R., Pandey, P., Gupta, K.: Using realistic simulation for performance analysis of mapreduce setups. In: Proceedings of the ACM Workshop on Large-Scale System and Application Performance (2009)

Hammoud, S., Maozhen, L., Yang, L., Alham, N.K., Zelong, L.: MRSim: a discrete event based MapReduce simulator. In: Proceedings of International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) (2010)

Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the ACM Symposium on Cloud computing (SoCC) (2010)

Koehler, M., Kaniovskyi, Y., Benkner, S.: An adaptive framework for the execution of data-intensive MapReduce applications in the cloud. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW) (2011)

Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4(6), 385–396 (2011)

Rizvandi, N.B., Zomaya, A.Y., Boloori, A.J., Taheri, J.: Preliminary results: modeling relation between total execution time of MapReduce applications and number of mappers/reducers. Technical Report No. 679, Center for Distributed and High Performance Computing, School of Information Technologies, University of Sydney (2011)

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

Shvachko, K., Hairong, K., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (2010)

Apache Nutch: Nutch homepage. Available online at http://nutch.apache.org/ (2010). Accessed on Nov. 2011

SmartFrog Organization: SmartFrog homepage. Available online at http://wiki.smartfrog.org/wiki/display/sf/SmartFrog+Home (2007). Accessed on Nov. 2011

Apache Mahout: Mahout homepage. Available online at http://mahout.apache.org/ (2010). Accessed on Nov. 2011

Apache Hadoop: Hadoop Wiki Power-By. Available online at http://wiki.apache.org/hadoop/PoweredBy (2010). Accessed on Nov. 2011

Farnham, I.M., Johannesson, K.H., Singh, A.K., Hodge, V.F., Stetzenbach, K.J.: Factor analytical approaches for evaluating groundwater trace element chemistry data. Anal. Chim. Acta 490(1–2), 123–138 (2003)

Manly, B.F.: Multivariate Statistical methods: A Primer. Chapman & Hall, Ltd., London (1986)

Vapnik, V., Golowich, S.E., Smola, A.J.: Support vector method for function approximation, regression estimation and signal processing. In: Proceedings of Conference on Neural Information Processing Systems (NIPS) (1996)

Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

Inspur Company: Inspur homepage. Available online at http://en.inspur.com/ (2006). Accessed on Nov. 2011

Beihang University NICC: The network information and computing center. Available online at http://nic.buaa.edu.cn/ (2010). Accessed on Nov. 2011