SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

Springer Science and Business Media LLC - 2017

Min Li¹, Jian Tan², Yandong Wang¹, Li Zhang¹, Valentina Salapura¹

¹IBM Almaden Research Center, San Jose, USA

²Ohio State University, Columbus, USA

Tóm tắt

Từ khóa

Tài liệu tham khảo

Agrawal, D., Butt, A., Kshitij, D., Larriba-Pey, J.-L., Li, M., Reiss, F.R., Raab, F., Schiefer, B., Xia, Y.: Sparkbench: a spark performance testing suite. In Proceedings of TPCTC (2015)

Amazon Movie Review. http://snap.stanford.edu/data/web-Movies.html

AMPLab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/

Apache GridMix. http://hadoop.apache.org/docs/r1.2.1/gridmix.html

Apache Spark. http://spark.apache.org/

Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. In Proceedings of the 2013 ACM SIGMOD, pp. 1185–1196 (2013)

Avery, C.: Giraph: large-scale graph processing infrastructure on hadoop. In: Proceedings of the Hadoop Summit, Santa Clara (2011)

Batarfi, O., El Shawi, R., Fayoumi, A.G., Nouri, R., Barnawi, A., Sakr, S., et al.: Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)

Chaimov, N., Malony, A., Canon, S., Iancu, C., Ibrahim, K.Z., Srinivasan, J.: Scaling spark on HPC systems. In: HPDC ’16, pp. 97–110. ACM, New York (2016)

Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM SOCC, pp. 143–154 (2010)

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th ACM ASPLOS, pp. 37–48 (2012)

Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: towards an industry standard benchmark for big data analytics. In: Proc of ACM SIGMOD (2013)

Google Web Graph. http://snap.stanford.edu/data/web-Google.html

Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Proceedings of the 8th IEEE ICDM (2008)

Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In 26th IEEE ICDEW, pp. 41–51 (2010)

IBM. Big Data and Analytics Hub. http://www.ibmbigdatahub.com/infographic/four-vs-big-data

IBM SoftLayer. http://www.softlayer.com/

James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013)

Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012)

Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of ACM SIGKDD (2008)

Kryo: a fast and efficient Object Graph Serialization Framework for Java. https://github.com/EsotericSoftware/kryo

Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings of Workshop on Analytics Platforms for the Cloud (2015)

Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: Bdgs: a scalable big data generator suite in big data benchmarking. In: Advancing Big Data Benchmarks, pp. 138–154. Springer, New York (2014)

Nyberg, C., Shah, M., Govindaraju, N.: Sort Benchmark. http://sortbenchmark.org/

Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.-G., VICSI: Making sense of performance in data analytics frameworks. In: Proceedings of USENIX NSDI (2015)

Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab (1999)

Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD (2009)

Peng, J., Choo, K.-K.R., Ashman, H.: Bit-level n-gram based forensic authorship analysis on social media: identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70, 171–182 (2016)

pigmix. Apache PigMix. https://cwiki.apache.org/confluence/display/PIG/PigMix

Quick, D., Choo, K.-K.R.: Big forensic data reduction: digital forensic images and electronic evidence. Clust. Comput. 19(2), 723–740 (2016)

Shi, J., Qui, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Ozcan, F.: Clash of the titans: mapreduce vs. spark for large scale data analytics. In: Proceedings of the VLDB Endowment (2015)

Spark Technology Center. https://github.com/SparkTC

SparkBench: A Comprehensive Spark Benchmarking Suite, Anonymized for double blind review. https://goo.gl/woHxxK

Spark-perf:Spark performance tests. https://github.com/databricks/spark-perf

TPC-DS. http://www.tpc.org/tpcds/

TPC-H. http://www.tpc.org/tpch/

Twitter4j: a Java Library for the Twitter API. http://twitter4j.org

Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench. http://prof.ict.ac.cn/BigDataBench/

Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: Bigdatabench: a big data benchmark suite from internet services. In: IEEE 20th HPCA, pp. 488–499 (2014)

Wikipedia Data Dumps. http://dumps.wikimedia.org/enwiki/

WikiXMLJ. https://code.google.com/p/wikixmlj/

Xiong, W., Yu, Z., Bei, Z., Zhao, J., Zhang, F., Zou, Y., Bai, X., Li, Y., Xu, C.: A characterization of big data benchmarks. In: IEEE International Conference on Big Data, pp. 118–125 (2013)

Xu, Z., Luo, X., Liu, Y., Choo, K.K.R., Sugumaran, V., Yen, N., Mei, L., Hu, C.: From latency, through outbreak, to decline: detecting different states of emergency events using web resources. IEEE Trans. Big Data PP(99):1–1 (2016)

Xu, Z., Xuan, J., Liu, Y., Choo, K.-K.R., Mei, L., Hu, C.: Building spatial temporal relation graph of concepts pair using web repository. In: Information Systems Frontiers, pp. 1–10 (2016)

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX NSDI, Berkeley, CA (2012)

Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using spark for big data analytics. Clust. Comput. 18(4), 1493–1501 (2015)

Zhu, J., Xu, C., Li, Z., Fung, G., Lin, X., Huang, J., Huang, C.: An examination of on-line machine learning approaches for pseudo-random generated data. Clust. Comput. 19(3), 1309–1321 (2016)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]