Automating ETL processes using the domain-specific modeling approach

Springer Science and Business Media LLC - Tập 15 - Trang 425-460 - 2016
Marko Petrović1, Milica Vučković1, Nina Turajlić1, Slađan Babarogić1, Nenad Aničić1, Zoran Marjanović1
1Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia

Tóm tắt

The development of Extract–Transform–Load (ETL) processes is the most complex, time-consuming and expensive phase of data warehouse development. Yet, the dynamics of modern business systems demand a more agile and flexible approach to their development. As a result, current research in this area is focused on ETL process conceptualization and the automation of ETL process development. This paper proposes a novel solution for automating ETL processes using the domain-specific modeling (DSM) approach. The proposed solution is based on the formal specification of ETL processes and the implementation of such formal specifications. Thus, in accordance with the DSM approach, several new domain-specific languages (DSLs) are introduced, each defining concepts relevant for a specific aspect of an ETL process. The focus of this paper is the actual implementation of the formal specification of an ETL process. To this end, a specific ETL platform (ETL-PL) is introduced to technologically support both the modeling of ETL processes (i.e., the creation of models in accordance with the introduced DSLs) and the automated transformation of the created models into the executable code of a specific application framework (representing ETL-PL’s execution environment). It should be emphasized that ETL-PL actually presumes the dynamic execution of ETL models or, more precisely, the executable code is generated at runtime. Thus the execution environment consists of code generator components and the components implementing the application framework. ETL-PL has been implemented as an extension of the .NET platform.

Tài liệu tham khảo

El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: Proceedings of DOLAP ‘09, (China), pp 41–48 El Akkaoui, Zimányi E, Mazón J-N, Trujillo J (2011) A model-driven framework for ETL process development. In: Proceedings of DOLAP ‘11, (UK), pp 45–52 El Akkaoui Z, Mazón J-N, Vaisman A, Zimányi E (2012) BPMN-based conceptual modeling of ETL processes. In: Data warehousing and knowledge discovery, LNCS 7448. Springer, Berlin, pp 1–14 Fowler M (2010) Domain-specific languages. Addison-Wesley Professional, Boston Greenfield J, Short K, Cook S, Kent S (2004) Software factories: assembling applications with patterns, models, frameworks, and tools. Wiley, Hoboken Hazzard K, Bock J (2013) Metaprogramming in.NET. Manning Publications, Greenwich Ivantsov R (2009) Irony—.NET language implementation kit. [Online] CodePlexProject Hosting for Open Source Software: http://irony.codeplex.com/ Jarke M, Lenzerini M, Vassiliou Y, Vassiliadis P (2003) Fundamentals of data warehouses. Springer, Berlin Kelly S, Tolvanen JP (2008) Domain-specific modeling: enabling full code generation. Wiley, Hoboken Kimball R, Caserta J (2004) The data warehouse ETL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data. Wiley, Hoboken Kimball R, Ross M, Thornthwaite W, Mundy J, Becker B (2010) The Kimball group reader: relentlessly practical tools for data warehousing and business intelligence. Wiley, Hoboken Luján-Mora S, Trujillo J (2004) A data warehouse engineering process. In: Advances in information systems, LNCS 3261. Springer, Berlin, pp 14–23 Luján-Mora S, Vassiliadis P, Trujillo J (2004) Data mapping diagrams for data warehouse design with UML. In: Conceptual modeling-ER 2004, LNCS 3288. Springer, Berlin, pp 191–204 Mazón J-N, Trujillo J (2008) An MDA approach for the development of data warehouses. Decis Support Syst 45(1):41–58 Microsoft (2013) Modeling SDK for Microsoft Visual Studio 2013. [Online] http://www.microsoft.com/en-us/download/details.aspx?id=40754 Microsoft (2014a) Emitting dynamic methods and assemblies. [Online] https://msdn.microsoft.com/en-us/library/8ffc3x75%28v=vs.110%29.aspx Microsoft (2014b) Expression trees (C# and Visual Basic). [Online] https://msdn.microsoft.com/en-us/library/bb397951.aspx Muñoz L, Mazón JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of data warehouses with UML activity diagrams. In: On the move to meaningful internet systems: OTM 2008 workshops, LNCS 5333. Springer, Berlin, pp 44–53 Muñoz L, Mazón JN, Trujillo J (2009) Automatic generation of ETL processes from conceptual models. In: Proceedings of DOLAP ‘09, (China), pp 33–40 Petrović M (2014) A model driven development approach for the data warehouse extract, transform and load process. Ph.D. Thesis final version (in Serbian), Faculty of Organizational Sciences, University of Belgrade, Serbia Simitsis A (2005) Mapping conceptual to logical models for ETL processes. In: Proceedings of DOLAP ‘05, (Germany), pp 67–76 Simitsis A, Vassiliadis P (2003) A methodology for the conceptual modeling of ETL processes. In: Proceedings of the decision systems engineering—DSE ‘03, (Austria), pp 305–316 Simitsis A, Vassiliadis P (2008) A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decis Support Syst 45(1):22–40 Simitsis A, Vassiliadis P, Terrovitis M, Skiadopoulos S (2005) Graph-based modeling of ETL activities with multi-level transformations and updates. In: Data warehousing and knowledge discovery, LNCS 3589. Springer, Berlin, pp 43–52 Troelsen A (2012) Pro C# 5.0 and the.NET 4.5 Framework. Apress Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL Processes in data warehouses. In: Conceptual modeling-ER 2003, LNCS 2813. Springer, Berlin, pp 307–320 Turajlić N, Petrović M, Vučković M (2014) Analysis of ETL process development approaches: some open issues. In: Proceedings of SYMORG’14, pp 45–51 Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Modeling ETL activities as graphs. In: Proceedings of DMDW’02, pp 52–61 Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Proceedings of DOLAP ‘02, (USA), pp 14–21 Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M (2003) A framework for the design of ETL scenarios. In: Advanced information systems engineering, LNCS 2681. Springer, Berlin, pp 520–535 Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M, Skiadopoulos S (2005) A generic and customizable framework for the design of ETL scenarios. Inf Syst 30(7):492–525 Vassiliadis P, Simitsis A, Baikousi E (2009) A taxonomy of ETL activities. In: Proceedings of DOLAP’09, (China), pp 25–32