Phân cụm dấu hiệu dựa trên kiến thức chuyên gia với các ràng buộc ở cấp độ thực thể

Knowledge and Information Systems - Tập 63 - Trang 1197-1220 - 2021
Pieter De Koninck1, Klaas Nelissen1, Seppe vanden Broucke1, Bart Baesens1,2, Monique Snoeck1, Jochen De Weerdt1
1KU Leuven, Research Center for Management Informatics (LIRIS), Leuven, Belgium
2Southampton Business School, University of Southampton, Southampton, UK

Tóm tắt

Trong lĩnh vực khai thác quy trình, có nhiều phương pháp phân cụm dấu hiệu khác nhau nhằm phân chia các dấu hiệu hoặc các trường hợp quy trình thành các nhóm tương tự. Thông thường, việc phân chia này dựa trên một số mẫu hoặc sự tương đồng giữa các dấu hiệu, hoặc được dẫn dắt bởi việc phát hiện một mô hình quy trình cho mỗi cụm. Tuy nhiên, nhược điểm chính của các kỹ thuật này là giải pháp của chúng thường khó đánh giá hoặc biện minh bởi các chuyên gia trong lĩnh vực. Trong bài báo này, chúng tôi trình bày hai kỹ thuật phân cụm dấu hiệu có ràng buộc có khả năng tận dụng kiến thức chuyên gia dưới hình thức các ràng buộc ở cấp độ thực thể. Qua một đánh giá thực nghiệm rộng rãi với hai bộ dữ liệu thực tế, chúng tôi chỉ ra rằng các kỹ thuật mới của chúng tôi thực sự có khả năng tạo ra các giải pháp phân cụm có thể biện minh tốt hơn mà không có ảnh hưởng tiêu cực đáng kể đến chất lượng của chúng.

Từ khóa

#khai thác quy trình #phân cụm dấu hiệu #ràng buộc cấp độ thực thể #kiến thức chuyên gia

Tài liệu tham khảo

Van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Data Min Knowl Discov 2(2):182–192 Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A (2018) Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1214-x Ben-Hur A, Elisseeff A, Guyon I (2001) A stability based method for discovering structure in clustered data. In: Pacific symposium on biocomputing, vol 7, pp 6–17 Bose RPJC, van der Aalst WMP (2009) Context aware trace clustering: towards improving process mining results. Sdm, pp 401–412. https://doi.org/10.1137/1.9781611972795.35 Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: Towards achieving better process models. In: Lect. Notes Bus. Inf. Process., vol 43 LNBIP, pp 170–181. https://doi.org/10.1007/978-3-642-12186-9_16 Chen J, Huang X, Kanj IA, Xia G (2006) Strong computational lower bounds via parameterized complexity. J Comput Syst Sci 72(8):1346–1367 Davidson I, Ravi SS (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 3721 LNAI, pp 59–70. https://doi.org/10.1007/11564126_11 Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: 10th European conference on principles and practice of knowledge discovery in databases, pp 115–126. https://doi.org/10.1007/11871637_15 De Koninck P, De Weerdt J, vanden Broucke SKLM (2017) Explaining clusterings of process instances. Data Min Knowl Disc 31(3):774–808. https://doi.org/10.1007/s10618-016-0488-4 De Koninck P, Nelissen K, Baesens B, vanden Broucke S, Snoeck M, De Weerdt J (2017) An approach for incorporating expert knowledge in trace clustering. In: Dubois E, Pohl K (eds) Advanced information systems engineering29th international conference, CAiSE 2017, Essen, Germany, June 12–16, 2017, proceedings. Springer, Cham, pp 561–576. https://doi.org/10.1007/978-3-319-59536-8_35 De Smedt J, De Weerdt J, Vanthienen J, Poels G (2016) Mixed-paradigm process modeling with intertwined state spaces. Bus Inf Syst Eng 58(1):19–29. https://doi.org/10.1007/s12599-015-0416-y De Weerdt J, De Backer M, Vanthienen J, Baesens B (2011) A robust f-measure for evaluating discovered process models. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 148–155. https://doi.org/10.1109/CIDM.2011.5949428 De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676. https://doi.org/10.1016/j.is.2012.02.004 De Weerdt J, vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720. https://doi.org/10.1109/TKDE.2013.64 Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl Based Syst 84:203–213. https://doi.org/10.1016/j.knosys.2015.04.012 Dumas M, Rosa ML, Mendling J, Reijers HA (2018) Fundamentals of business process management, 2nd edn. Springer, Berlin. https://doi.org/10.1007/978-3-662-56509-4 Eaton E, des Jardins M, Jacob S (2014) Multi-view constrained clustering with an incomplete mapping between views. Knowl Inf Syst 38(1):231–257. https://doi.org/10.1007/s10115-012-0577-7 Goedertier S, Martens D, Vanthienen J, Baesens B (2009) Robust process discovery with artificial negative events. J Mach Learn Res 10:1305–1340 Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Technical report, Stanford Law M, Topchy A, Jain A (2005) Model-based clustering with probabilistic constraints. Sdm pp 1–5, https://doi.org/10.1137/1.9781611972757.77 Leemans SJJ, Fahland D, van der Aalst WMP (2013) Discovering block-structured process models from event logs: a constructive approach. In: Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). Springer, Berlin, pp 311–329. https://doi.org/10.1007/978-3-642-38697-8_17 Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):3:1-3:41. https://doi.org/10.1145/1824795.1824798 Mannhardt F, de Leoni M, Reijers HA, van der Aalst WM, Toussaint PJ (2016) From low-level events to activities—a pattern-based approach. In: 14th international conference, BPM 2016, Rio de Janeiro, Brazil, September 18–22, LNCS. Springer, Berlin, pp 125–141. https://doi.org/10.1007/978-3-319-45348-4_8 Martens D, Vanthienen J, Verbeke W, Baesens B (2011) Performance of classification models from a user perspective. Decis Support Syst 51(4):782–793. https://doi.org/10.1016/j.dss.2011.01.013 Mu noz-Gama J, Carmona J (2010) A fresh look at precision in process conformance. In: Hull R, Mendling J, Tai S (eds) Business process management: 8th international conference, BPM 2010, Hoboken, NJ, USA, September 13–16. Proceedings. Springer, Berlin, pp 211–226. https://doi.org/10.1007/978-3-642-15618-2_16 Murtagh F (1984) A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput J 26:354–359 Rozinat A, Van der Aalst WM (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95 Song M, Günther C, van der Aalst WMP (2009) Trace clustering in business process mining. In: Bus. Process Manag. Work. Springer, Berlin, vol 17, pp 109–120. https://doi.org/10.1007/978-3-642-00328-8_11 Tax N, Sidorova N, Haakma R, van der Aalst WMP (2016) Mining local process models. J Innov Dig Ecosyst 3(2):183–196. https://doi.org/10.1016/j.jides.2016.11.001 van der Aalst WMP, Adriansyah A, Van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192. https://doi.org/10.1002/widm.1045 Van Dongen B (2015) Bpi challenge 2015 (dataset). https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1 vanden Broucke S, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decision Support Syst 100(Supplement C):109–118. https://doi.org/10.1016/j.dss.2017.04.005 (Ssmart Business Process Management) vanden Broucke S, De Weerdt J, Vanthienen J, Baesens B (2014) Determining process model precision and generalization with weighted artificial negative events. IEEE Trans Knowl Data Eng 26(8):1877–1889 Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma S, Sadiq S, Leymann F (eds) Business process management workshops. Springer, Berlin, pp 92–103 Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: ICML. Morgan Kaufmann, pp 577–584 Wang N, Sun S, OuYang D (2016) Business process modeling abstraction based on semi-supervised clustering analysis. Bus Inf Syst Eng. https://doi.org/10.1007/s12599-016-0457-x Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, KDD ’10, pp 563–572. https://doi.org/10.1145/1835804.1835877 Weijters A, van der Aalst WMP, De Medeiros AA (2006) Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Technical Report, WP, vol 166, pp 1–34 Zhu S, Wang D, Li T (2010) Data clustering with size constraints. Knowl Based Syst 23(8):883–889. https://doi.org/10.1016/j.knosys.2010.06.003