Multi-Dimensional Event Data in Graph Databases

Journal on Data Semantics - Tập 10 Số 1-2 - Trang 109-141 - 2021
Stefan Esser1, Dirk Fahland2
1INFORM GmbH, Aachen, Germany
2Eindhoven University of Technology, Eindhoven, The Netherlands

Tóm tắt

AbstractProcess event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as “directly/eventually-follows,” it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data, but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously. The queries allow for efficiently converting large real-life event data sets into our data model, and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multi-dimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically, aggregation queries on our data model enable process mining over multiple inter-related entities using off-the-shelf technology.

Từ khóa


Tài liệu tham khảo

van der Aalst WMP (2016) Process mining - Data Science in Action, 2nd edn. Springer, pp 3-452. ISBN 978-3-662-49850-7

Ieee standard for extensible event stream (xes) for achieving interoperability in event logs and event streams. IEEE Std 1849-2016 pp 1–50 (2016)

Bottrighi A, Canensi L, Leonardi G, Montani S, Terenziani P (2016) Trace retrieval for business process operational support. Expert Syst Appl 55:212–221

Deutch D, Milo T (2009) TOP-K projection queries for probabilistic business processes. In: ICDT 2009, ACM international conference proceeding series, vol 361, pp 239–251. ACM

Liu D, Pedrinaci C, Domingue J (2009) Semantic enabled complex event language for business process monitoring. In: 4th international workshop on semantic business process management, pp 31–34

Räim M, Ciccio CD, Maggi FM, Mecella M, Mendling J (2014) Log-based understanding of business processes through temporal logic query checking. In: OTM, LNCS, vol 8841, pp 75–92. Springer

Song L, Wang J, Wen L, Wang W, Tan S, Kong H (2011) Querying process models based on the temporal relations between tasks. In: EDOCW 2011, pp 213–222. IEEE Computer Society

Tang Y, Mackey I, Su J (2018) Querying workflow logs. Information 9(2):25

Augusto A, Conforti R, Dumas M, Rosa ML, Maggi FM, Marrella A, Mecella M, Soo A (2019) Automated discovery of process models from event logs: Review and benchmark. IEEE Trans Knowl Data Eng 31(4):686–705. https://doi.org/10.1109/TKDE.2018.2841877

Weerdt JD, Backer MD, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676. https://doi.org/10.1016/j.is.2012.02.004

Jans M, Soffer P (2017) From relational database to event log: Decisions with quality impact. In: BPM 2017 Workshops, LNBIP, vol 308, pp 588–599. Springer

Lu X, Nagelkerke M, van de Wiel D, Fahland D (2015) Discovering interacting artifacts from ERP systems. IEEE Trans Serv Comput 8(6):861–873

de Murillas EGL, Reijers HA, van der Aalst WMP (2016) Everything you always wanted to know about your process, but did not know how to ask. In: BPM Workshops, LNBIP, vol 281, pp 296–309

de Murillas EGL, Reijers HA, van der Aalst WMP (2019) Connecting databases with process mining: a meta model and toolset. Softw Syst Model 18(2):1209–1247

Dijkman RM, Gao J, Syamsiyah A, van Dongen BF, Grefen P, ter Hofstede AHM (2020) Enabling efficient process mining on large data sets: realizing an in-database process mining operator. Distrib Parallel Databases 38(1):227–253. https://doi.org/10.1007/s10619-019-07270-1

Schönig S, Rogge-Solti A, Cabanillas C, Jablonski S, Mendling J (2016) Efficient and customisable declarative process mining with SQL. In: Nurcan S, Soffer P, Bajec M, Eder J (eds) Advanced information systems engineering - 28th international conference, CAiSE 2016, Ljubljana, Slovenia, June 13-17, 2016. Proceedings, lecture notes in computer science, vol 9694, pp 290–305. Springer (2016). https://doi.org/10.1007/978-3-319-39696-5_18

van der Aalst WMP (2019) Object-centric process mining: Dealing with divergence and convergence in event data. In: Ölveczky PC, Salaün G (eds) Software engineering and formal methods - 17th international conference, SEFM 2019, Oslo, Norway, September 18-20, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11724, pp 3–25. Springer. https://doi.org/10.1007/978-3-030-30446-1_1

Li G, de Murillas EGL, de Carvalho RM, van der Aalst WMP (2018) Extracting object-centric event logs to support process mining on databases. In: Mendling J, Mouratidis H (eds) Information systems in the big data Era - CAiSE Forum 2018, Tallinn, Estonia, June 11-15, 2018, proceedings, lecture notes in business information processing, vol 317, pp 182–199. Springer (2018). https://doi.org/10.1007/978-3-319-92901-9_16

Popova V, Fahland D, Dumas M (2015) Artifact lifecycle discovery. Int J Cooperative Inf Syst 24(1):1550001:1–1550001:44. https://doi.org/10.1142/S021884301550001X

Beheshti A, Benatallah B, Motahari-Nezhad HR (2018) Processatlas: A scalable and extensible platform for business process analytics. Softw Pract Exp 48(4):842–866. https://doi.org/10.1002/spe.2558

Berti A, van der Aalst WMP (2020) Extracting multiple viewpoint models from relational databases. In: Ceravolo P, van Keulen M, López MTG (eds) Data-driven process discovery and analysis - 8th IFIP WG 2.6 international symposium, SIMPDA 2018, Seville, Spain, December 13-14, 2018, and 9th international symposium, SIMPDA 2019, Bled, Slovenia, September 8, 2019, Revised selected papers, lecture notes in business information processing, vol 379, pp 24–51. Springer. https://doi.org/10.1007/978-3-030-46633-6_2

Esser S, Fahland D (2019) Storing and querying multi-dimensional process event logs using graph databases. In: Francescomarino CD, Dijkman RM, Zdun U (eds) Business process management workshops - BPM 2019 international workshops, Vienna, Austria, September 1-6, 2019, D, vol 362, pp 632–644. Springer. https://doi.org/10.1007/978-3-030-37453-2_51

Werner M, Gehrke N (2015) Multilevel process mining for financial audits. IEEE Trans Serv Comput 8(6):820–832. https://doi.org/10.1109/TSC.2015.2457907

Gonzalez Lopez de Murillas E (2019) Process mining on databases: extracting event data from real-life data sources. Ph.D. thesis, Department of Mathematics and Computer Science (2019). Proefschrift

Robinson I, Webber J, Eifrem E (2013) Graph databases. O’Reilly Media

van Dongen B (2014) BPI challenge 2014. Dataset. https://doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35

van Dongen B (2016) BPI challenge 2016. Dataset. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab

van Dongen B (2017) BPI challenge 2017. Dataset. https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b

van Dongen B (2018) BPI challenge 2018. Dataset. https://doi.org/10.4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972

van Dongen B (2019) BPI challenge 2019. Dataset. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

van Dongen B (2015) BPI challenge 2015. Dataset. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1

Cohen J, Dolan B, Dunlap M, Hellerstein JM, Welton C (2009) Mad skills: New analysis practices for big data. Proc VLDB Endow 2(2):1481–1492. https://doi.org/10.14778/1687553.1687576

Marín-Ortega PM, Dmitriyev V, Abilov M, Gómez JM (2014) Elta: New approach in designing business intelligence solutions in era of big data. Procedia technology 16:667 – 674. https://doi.org/10.1016/j.protcy.2014.10.015. http://www.sciencedirect.com/science/article/pii/S2212017314002424

Esser S, Fahland D (2014) Event graph of BPI challenge 2014. Dataset. https://doi.org/10.4121/14169494

Esser S, Fahland D (2015) Event graph of BPI challenge 2015. Dataset. https://doi.org/10.4121/14169569

Esser S, Fahland D (2016) Event graph of BPI challenge 2016. Dataset. https://doi.org/10.4121/14164220

Esser S, Fahland D (2017) Event graph of BPI challenge 2017. Dataset. https://doi.org/10.4121/14169584

Esser S, Fahland D (2019) Event graph of BPI challenge 2019. Dataset. https://doi.org/10.4121/14169614

Polyvyanyy A, Pika A, ter Hofstede AHM (2020) Scenario-based process querying for compliance, reuse, and standardization. Inf Syst 93:101563. https://doi.org/10.1016/j.is.2020.101563

Polyvyanyy A, ter Hofstede AHM, Rosa ML, Ouyang C, Pika A (2019) Process query language: design, implementation, and evaluation. CoRR arXiv:1909.09543

Esser S, Fahland D (2020) Event data and queries for multi-dimensional event data in the Neo4j graph database (Version 1.0). Dataset. https://doi.org/10.5281/zenodo.3865222

Fahland D (2019) Describing behavior of processes with many-to-many interactions. In: Donatelli S, Haar S (eds) Application and theory of petri nets and concurrency - 40th international conference, PETRI NETS 2019, Aachen, Germany, June 23-28, 2019, proceedings, lecture notes in computer science, vol 11522, pp 3–24. Springer (2019). https://doi.org/10.1007/978-3-030-21571-2_1

Syamsiyah A, van Dongen BF, van der Aalst WMP (2016) DB-XES: enabling process discovery in the large. In: Ceravolo P, Guetl C, Rinderle-Ma S (eds) Data-driven process discovery and analysis - 6th IFIP WG 2.6 international symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised selected papers, lecture notes in business information processing, vol 307, pp 53–77. Springer (2016). https://doi.org/10.1007/978-3-319-74161-1_4

Cuevas-Vicenttín V, Dey SC, Wang MLY, Song T, Ludäscher B (2012) Modeling and querying scientific workflow provenance in the D-OPM. In: 2012 SC Companion, pp 119–128. IEEE Computer Society

Huang X, Bao Z, Davidson SB, Milo T, Yuan X (2015) Answering regular path queries on workflow provenance. In: ICDE 2015, pp 375–386. IEEE Computer Society

de Murillas EGL, Hoogendoorn GE, Reijers HA (2017) Redo log process mining in real life: Data challenges & opportunities. In: Teniente E, Weidlich M (eds) Business process management workshops - BPM 2017 international workshops, Barcelona, Spain, September 10-11, 2017, Revised papers, lecture notes in business information processing, vol 308, pp 573–587. Springer. https://doi.org/10.1007/978-3-319-74030-0_45

zur Muehlen M (2009) Workflow management coalition - business process analytics format specification. Technical report, WfMC

Baquero AV, Molloy O (2012) Integration of event data from heterogeneous systems to support business process analysis. In: IC3K, CCIS, vol 415, pp 440–454. Springer

Beheshti S, Benatallah B, Motahari-Nezhad HR (2016) Scalable graph-based OLAP analytics over process execution data. Distrib Parallel Databases 34(3):379–423. https://doi.org/10.1007/s10619-014-7171-9

Beheshti S, Benatallah B, Nezhad HRM, Sakr S (2011) A query language for analyzing business processes execution. In: BPM 2011, LNCS, vol 6896, pp 281–297. Springer

Francis N, Green A, Guagliardo P, Libkin L, Lindaaker T, Marsault V, Plantikow S, Rydberg M, Selmer P, Taylor A (2018) Cypher: An evolving query language for property graphs. In: Management of data, pp 1433–1445. ACM

Esser S (2019) Using graph data structures for event logs. Capita selecta research project., Eindhoven University of Technology (2019). https://doi.org/10.5281/zenodo.3333831

van der Aalst WMP, Reijers HA, Song M (2005) Discovering social networks from event logs. Comput Support Coop Work 14(6):549–593. https://doi.org/10.1007/s10606-005-9005-9

van der Aalst WMP, Rubin VA, Verbeek HMW, van Dongen BF, Kindler E, Günther CW (2010) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9(1):87–111. https://doi.org/10.1007/s10270-008-0106-z

Lu X, Fahland D, van der Aalst WMP (2014) Conformance checking based on partially ordered event data. In: Fournier F, Mendling J (eds) Business process management workshops - BPM 2014 international workshops, Eindhoven, The Netherlands, September 7-8, 2014, revised papers, lecture notes in business information processing, vol 202, pp 75–88. Springer (2014). https://doi.org/10.1007/978-3-319-15895-2_7

Pegoraro M, Uysal MS, van der Aalst WMP (2019) Discovering process models from uncertain event data. In: Francescomarino CD, Dijkman RM, Zdun U (eds) Business process management workshops - BPM 2019 international workshops, Vienna, Austria, September 1-6, 2019, revised selected papers, lecture notes in business information processing, vol 362, pp 238–249. Springer (2019). https://doi.org/10.1007/978-3-030-37453-2_20

Bonifati, A., Fletcher, G.H.L., Voigt, H., Yakovets, N.: Querying graphs. Synthesis lectures on data management. Morgan & Claypool Publishers (2018). https://doi.org/10.2200/S00873ED1V01Y201808DTM051

Angles R, Arenas M, Barceló P, Boncz PA, Fletcher GHL, Gutierrez C, Lindaaker T, Paradies M, Plantikow S, Sequeda JF, van Rest O, Voigt H (2018) G-CORE: A core for future graph query languages. In: Das G, Jermaine CM, Bernstein PA (eds) Proceedings of the 2018 international conference on management of data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pp 1421–1432. ACM. https://doi.org/10.1145/3183713.3190654

Polyvyanyy A, Weidlich M, Conforti R, Rosa ML, ter Hofstede AHM (2014) The 4c spectrum of fundamental behavioral relations for concurrent systems. In: Ciardo G, Kindler E (eds) Application and theory of petri nets and concurrency - 35th international conference, PETRI NETS 2014, Tunis, Tunisia, June 23-27, 2014. Proceedings, lecture notes in computer science, vol 8489, pp 210–232. Springer. https://doi.org/10.1007/978-3-319-07734-5_12

Augusto A, Conforti R, Dumas M, Rosa ML, Polyvyanyy A (2019) Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59(2):251–284. https://doi.org/10.1007/s10115-018-1214-x