Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

Eric Costa1, Carlos Costa1, Maribel Yasmina Santos1
1ALGORITMI Research Centre, University of Minho, 4800 058, Guimarães, Portugal

Tóm tắt

Từ khóa


Tài liệu tham khảo

Apache (2014) Apache Hadoop. http://hadoop.apache.org/ .

Capriolo E, Wampler D, Rutherglen J. Programming Hive. O’Reilly Media, Inc. 2012.

Cassavia N, Dicosta P, Masciari E, Saccà D. Data preparation for tourist Data Big Data Warehousing. In: Proceedings of 3rd international conference on data management technologies and applications (DATA). SciTePress, 2014. p. 419–26.

Chavalier M, El Malki M, Kopliku A, et al. Document-Oriented Data Warehouses: models and extended cuboids. In: 10th international conference on research challenges in information science (RCIS). IEEE, 2016. P. 1–11.

Chevalier M, El Malki M, Kopliku A, et al. Implementation of multidimensional databases in column-oriented NoSQL systems. In: East European conference on advances in databases and information systems. 2015. p. 79–91.

Costa C, Santos MY. The SusCity big data warehousing approach for smart cities. In: Proceedings of the 21st international database engineering & applications symposium. 2017. p. 264–73.

Costa C, Santos MY. Evaluating Several Design Patterns and Trends in Big Data Warehousing Systems. In J. Krogstie & H. A. Reijers (Eds.), Advanced Information Systems Engineering (Vol. 10816, pp. 459–473). In: Proceedings of the 30th international conference on advanced information systems engineering (CAiSE’2018). Cham: Springer International Publishing; 2018.

Costa E (2018) SSB Scripts. https://github.com/EduardaCosta/ScriptsSSB . Accessed 19 Dec 2018.

Costa E, Costa C, Santos MY. Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses. In: Themistocleous M, Morabito V, editors. 14th European, Mediterranean, and Middle Eastern Conference (EMCIS). Coimbra: Springer International Publishing; 2017. p. 3–16.

Costa E, Costa C, Santos MY (2018) Partitioning and Bucketing in Hive-Based Big Data Warehouses. In: WorldCIST’18 - World Conference on Information Systems and Technologies. Springer International Publishing, pp 764–774.

De Mauro A, Greco M, Grimaldi M. What is Big Data? A Consensual Definition and a Review of Key Research Topics. In: AIP conference proceedings. AIP Publishing; 2015. p. 97–104.

Dere J (2017) Apache Hive. https://cwiki.apache.org/confluence/display/Hive/Home .

Di Tria F, Lefons E, Tangorra F. A framework for evaluating design methodologies for Big Data Warehouses: measurement of the design process. Int J Data Warehous Min. 2018;14:15–39. https://doi.org/10.4018/IJDWM.2018010102 .

Di Tria F, Lefons E, Tangorra F. Design process for Big Data Warehouses. In: IEEE 2014 International conference on data science and advanced analytics (DSAA). 2014. p. 512–18.

Du D. Apache Hive Essentials. Packt Publishing Ltd. 2015.

Hortonworks I (2017) Hortonworks. https://hortonworks.com . Accessed 22 Oct 2017.

Kimball R, Ross M. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3 edn. New York: Wiley; 2013.

Krishnan K (2013) Data Warehousing in the Age of Big Data. Elsevier Inc.

Kumar AS (2016) Performance analysis of MySQL Partition, Hive Partition-Bucketing and Apache Pig. In: Information Processing (IICIP), 2016 1st India International Conference. IEEE, p. 1–6.

Martinho B, Santos MY. An architecture for Data Warehousing in Big Data environments. International conference on research and practical issues of enterprise information systems. Cham: Springer; 2016. p. 237–50.

Mohanty S, Jagadeesh M, Srivatsa H. Big data imperatives: enterprise Big Data Warehouse, BI implementations and analytics. New York: Apress; 2013.

O’Neil P, O’Neil B, Chen X. The star schema benchmark (SSB). 2007.

Philip Chen CL, Zhang CY. Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci. 2014;275:314–47. https://doi.org/10.1016/j.ins.2014.01.015 .

Ptiček M, Vrdoljak B. Big Data and New Data Warehousing Approaches. In: Proceedings of the 2017 International Conference on Cloud and Big Data Computing. ACM, 2017. p. 6–10.

Russom P. Evolving Data Warehouse Architectures in the Age of Big Data. 2014.

Sandoval LJ. Design of business intelligence applications using big data technology. In: Central American and Panama Convention (CONCAPAN XXXV), 2015 IEEE Thirty Fifth. Institute of Electrical and Electronics Engineers Inc., 2016. p. 1–6.

Santos MY, Costa C (2016a) Data Warehousing in Big Data: from multidimensional to tabular data models. In: C3S2E’16—Ninth international C* conference on computer science & software engineering. p. 10.

Santos MY, Costa C. Data models in NoSQL databases for Big Data contexts. In: Tan Y, Shi Y, editors. International Conference on Data Mining and Big Data. Cham: Springer International Publishing; 2016. p. 475–85.

Santos MY, Costa C, Galvão J, et al. Evaluating SQL-on-Hadoop for Big Data Warehousing on not-so-good hardware. In: Proceedings of the 21st international database engineering & applications symposium. ACM, New York, NY, USA. 2017. p. 242–52.

Shaw S, Vermeulen AF, Gupta A, Kjerrumgaard D. Practical Hive: a guide to Hadoop’s Data Warehouse System. New York: Apress; 2016.

Thusoo A, Sarma J Sen, Jain N, et al. Hive—a Warehousing solution over a map-reduce framework. In: Proceedings of the VLDB endowment. 2009. p. 1626–9.

Thusoo A, Sen Sarma J, Jain N, et al. Hive—a Petabyte Scale Data Warehouse using Hadoop. In: 2010 IEEE 26th international conference on Data Engineering (ICDE), 2010. p. 996–1005.

TPC (2017a) TPC. http://www.tpc.org/tpch/ .

TPC (2017b) TPC-H—Homepage. http://www.tpc.org/tpch/ . Accessed 16 Aug 2017.

Yangui R, Nabli A, Gargouri F. Automatic transformation of data warehouse schema to NoSQL data base: comparative study. Procedia Comput Sci. 2016;96:255–64.

Zikopoulos P, Eaton C. Understanding Big Data: analytics for enterprise class hadoop and streaming data. 1st ed. Delhi: McGraw-Hill Osborne Media; 2011.