SRIHASS - a similarity measure for discovery of hidden time profiled temporal associations
Tóm tắt
Mining and visualization of time profiled temporal associations is an important research problem that is not addressed in a wider perspective and is understudied. Visual analysis of time profiled temporal associations helps to better understand hidden seasonal, emerging, and diminishing temporal trends. The pioneering work by Yoo and Shashi Sekhar termed as SPAMINE applied the Euclidean distance measure. Following their research, subsequent studies were only restricted to the use of Euclidean distance. However, with an increase in the number of time slots, the dimensionality of a prevalence time sequence of temporal association, also increases, and this high dimensionality makes the Euclidean distance not suitable for the higher dimensions. Some of our previous studies, proposed Gaussian based dissimilarity measures and prevalence estimation approaches to discover time profiled temporal associations. To the best of our knowledge, there is no research that has addressed a similarity measure which is based on the standard score and normal probability to find the similarity between temporal patterns in z-space and retains monotonicity. Our research is pioneering work in this direction. This research has three contributions. First, we introduce a novel similarity (or dissimilarity) measure, SRIHASS to find the similarity between temporal associations. The basic idea behind the design of dissimilarity measure is to transform support values of temporal associations onto z-space and then obtain probability sequences of temporal associations using a normal distribution chart. The dissimilarity measure uses these probability sequences to estimate the similarity between patterns in z-space. The second contribution is the prevalence bound estimation approach. Finally, we give the algorithm for time profiled associating mining called Z-SPAMINE that is primarily inspired from SPAMINE. Experiment results prove that our approach, Z-SPAMINE is computationally more efficient and scalable compared to existing approaches such as Naïve, Sequential and SPAMINE that applies the Euclidean distance.
Tài liệu tham khảo
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969 https://doi.org/10.1109/69.553164
Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules in Large Databases. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th International Conference on Very Large Data Bases (VLDB ‘94). Morgan Kaufmann Publishers Inc., San Francisco, pp 487–499
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216 https://doi.org/10.1145/170036.170072
Ale JM, Rossi GH (2000) An approach to discovering temporal association rules. In: Carroll J, Damiani E, Haddad H, Oppenheim D (eds) Proceedings of the 2000 ACM symposium on Applied computing - Volume 1 (SAC ‘00), vol 1. ACM, New York, pp 294–300 https://doi.org/10.1145/335603.335770
Aljawarneh S, Radhakrishna V, Kumar PV, Janaki V (2016) A similarity measure for temporal pattern discovery in time series data generated by IoT. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–4 https://doi.org/10.1109/ICEMIS.2016.7745355
Aljawarneh SA, Elkobaisi MR, Maatuk AM (2016) A new agent approach for recognizing research trends in wearable systems. Computers & Electrical Engineering, Available online 16 December 2016, https://doi.org/10.1016/j.compeleceng.2016.12.003
Aljawarneh SA, Moftah RA, Maatuk AM (2016) Investigations of automatic methods for detecting the polymorphic worms signatures. Futur Gener Comput Syst 60:67–77 ISSN 0167-739X, https://doi.org/10.1016/j.future.2016.01.020
Aljawarneh SA, Radhakrishna V, Kumar PV, Janaki V (2017) G-SPAMINE: An approach to discover temporal association patterns and trends in internet of things. Futur Gener Comput Syst 74:430–443 ISSN 0167-739X, https://doi.org/10.1016/j.future.2017.01.013
Aljawarneh S, Aldwairi M, Yassein MB (2017) Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci ISSN 1877-7503, https://doi.org/10.1016/j.jocs.2017.03.006
Bettini C, Wang XS, Jajodia S, Lin JL (1998) Discovering frequent event patterns with multiple granularities in time sequences. IEEE Trans Knowl Data Eng 10(2):222–237 https://doi.org/10.1109/69.683754
Christian Borgelt (2005) Keeping things simple: finding frequent item sets by recursive elimination. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations (OSDM ‘05). ACM, New York, pp 66–70. https://doi.org/10.1145/1133905.1133914
Chen X, Petrounias I (2000) Discovering temporal association rules: algorithms, language and system. In: Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), pp 306–306. https://doi.org/10.1109/ICDE.2000.839423
Chen X, Petrounias I (1999) Mining temporal features in association rules. In: Żytkow JM, Rauch J (eds) Principles of data mining and knowledge discovery. PKDD 1999. Lecture Notes in Computer Science, vol 1704. Springer, Berlin, Heidelberg
Chen YC, Peng WC, Lee SY (2015) Mining Temporal Patterns in Time Interval-Based Data. IEEE Trans Knowl Data Eng 27(12):3318–3331 https://doi.org/10.1109/TKDE.2015.2454515
Cheruvu A, Radhakrishna V (2016) Estimating temporal pattern bounds using negative support computations. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–4 https://doi.org/10.1109/ICEMIS.2016.7745352
Cheung D, Han J, Ng V, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. Proc. 1996 Int’l Conf. Data Eng, pp 106–114. https://doi.org/10.1109/ICDE.1996.492094
Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD, Cheng Y (2001) Finding Interesting Associations without Support Pruning. IEEE Trans on Knowl and Data Eng 13(1):64–78 https://doi.org/10.1109/69.908981
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘99). ACM, New York, pp 43–52. https://doi.org/10.1145/312129.312191
Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: Dayal U, PMD G, Nishio S (eds) Proceedings of the 21th International Conference on Very Large Data Bases (VLDB ‘95). Morgan Kaufmann Publishers Inc., San Francisco, pp 420–431
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337), Sydney, pp 106–115. https://doi.org/10.1109/ICDE.1999.754913
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87. Kluwer Academic Publishers. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
Imran A, Aljawarneh SA, Sakib K Web Data Amalgamation for Security Engineering: Digital Forensic Investigation of Open Source Cloud. J Univers Comput Sci 22(4):494–520 https://doi.org/10.3217/jucs-022-04-0494
Jiang JY, Liou RJ, Lee SJ (2011) A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification. IEEE Trans Knowl Data Eng 23(3):335–349 https://doi.org/10.1109/TKDE.2010.122
Kumar GR, Mangathayaru N, Narasimha G (2015) An improved k-means clustering algorithm for intrusion detection using gaussian function. In: Proceedings of the The International Conference on Engineering & MIS 2015 (ICEMIS ‘15). ACM, New York, pp 69:1–69:7. https://doi.org/10.1145/2832987.2833082
Kumar GR, Mangathayaru N, Narasimha G (2016a) An approach for intrusion detection using novel Gaussian based Kernel function. J Univers Comput Sci 22(4):589–604. https://doi.org/10.3217/jucs-022-04-0589
Kumar GR, Mangathayaru N, Narsimha G (2016b) Design of novel fuzzy distribution function for dimensionality reduction and intrusion detection. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–6 https://doi.org/10.1109/ICEMIS.2016.7745346
Kumar GR, Mangathayaru N, Gugulothu N, Suresh Reddy G (2016) CLAPP: A self constructing feature clustering approach for anomaly detection, Future Generation Computer Systems, Available online 4 January 2017, ISSN 0167-739X, https://doi.org/10.1016/j.future.2016.12.040
Last M, Klein Y, Kandel A (2001) Knowledge discovery in time series databases. IEEE Trans Syst Man Cybern Part B Cybern 31(1):160–169 https://doi.org/10.1109/3477.907576
Lee W-J, Lee S-J (2004) Discovery of fuzzy temporal association rules. IEEE Trans Syst Man Cybern Part B Cybern 34(6):2330–2342 https://doi.org/10.1109/TSMCB.2004.835352
Lee C-H, Lin C-R, Chen M-S (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Paques H, Liu L, Grossman D (eds) Proceedings of the tenth international conference on Information and knowledge management (CIKM ‘01). ACM, New York, pp 263–270 https://doi.org/10.1145/502585.502630
Lee C-H, Chen M-S, Lin C-R (2003) Progressive partition miner: an efficient algorithm for mining general temporal association rules. IEEE Trans Knowl Data Eng 15(4):1004–1017 https://doi.org/10.1109/TKDE.2003.1209015
Li Y, Ning P, Wang XS, Jajodia S (2001) Discovering calendar-based temporal association rules. In: Proceedings Eighth International Symposium on Temporal Representation and Reasoning. TIME 2001, Cividale del Friuli, pp. 111–118. https://doi.org/10.1109/TIME.2001.930706
Li Y, Ning P, Wang XS, Jajodia S (2003) Discovering calendar-based temporal association rules, data & knowledge engineering, Volume 44, Issue 2, Pages 193-218, ISSN 0169-023X, https://doi.org/10.1016/S0169-023X(02)00135-0
Lin YS, Jiang JY, Lee SJ (2014) A Similarity Measure for Text Classification and Clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590 https://doi.org/10.1109/TKDE.2013.19
Lind DA, Marchal WG, Wathen SA (2004) Statistical techniques in business and economics, 12e: Chapter 7: Continuous Probability Distributions. The McGraw-Hill Companies, New York
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘99). ACM, New York, pp 337–341. https://doi.org/10.1145/312129.312274
Ozden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceedings 14th International Conference on Data Engineering, pp 412–421. https://doi.org/10.1109/ICDE.1998.655804
Park JS, Yu PS, Chen M-S (1997) Mining association rules with adjustable accuracy. In: Proceedings of the sixth international conference on Information and knowledge management (CIKM ‘97). ACM, New York, 151–160. https://doi.org/10.1145/266714.266886
Radhakrishna V, Kumar PV, Janaki V (2016) A computationally optimal approach for extracting similar temporal patterns. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–6 https://doi.org/10.1109/ICEMIS.2016.7745344
Radhakrishna V, Kumar PV, Janaki V (2016) Mining of outlier temporal patterns. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–6 https://doi.org/10.1109/ICEMIS.2016.7745343
Radhakrishna V, Kumar PV, Janaki V, Aljawarneh S (2016) A similarity measure for outlier detection in timestamped temporal databases. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–5 https://doi.org/10.1109/ICEMIS.2016.7745347
Radhakrishna V, Kumar PV, Janaki V (2016) Looking into the possibility of novel dissimilarity measure to discover similarity profiled temporal association patterns in IoT. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–5 https://doi.org/10.1109/ICEMIS.2016.7745353
Radhakrishna V, Kumar PV, Janaki V, Aljawarneh S (2016) A computationally efficient approach for temporal pattern mining in IoT. 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, pp 1–4 https://doi.org/10.1109/ICEMIS.2016.7745354
Radhakrishna V, Aljawarneh SA, Kumar PV, Janaki V (2017) A novel fuzzy similarity measure and prevalence estimation approach for similarity profiled temporal association pattern mining, future generation computer systems, Available online 14 March 2017, ISSN 0167-739X, https://doi.org/10.1016/j.future.2017.03.016
Radhakrishna V, Kumar PV, Janaki V (2017) Design and analysis of similarity measure for discovering similarity profiled temporal association patterns. IADIS International Journal on Computer Science and Information Systems 12(1):45–60 http://www.iadisportal.org/ijcsis/papers/2017200104.pdf
Radhakrishna V, Kumar PV, Janaki V, Cheruvu A (2017) A dissimilarity measure for mining similar temporal association patterns. IADIS International Journal on Computer Science and Information Systems 12(1):126–142 http://www.iadisportal.org/ijcsis/papers/2017200109.pdf
Radhakrishna V, Kumar PV, Janaki V (2017) Normal distribution based similarity profiled temporal association pattern mining (N-SPAMINE). Database Systems Journal 7(3):22–33
Radhakrishna V, Kumar PV, Janaki V (2017) A Novel Similar Temporal System Call Pattern Mining for Efficient Intrusion Detection. J Univers Comput Sci 22(4):475–493 https://doi.org/10.3217/jucs-022-04-0475
Radhakrishna V, Kumar PV, Janaki V (2017) A computationally efficient approach for mining similar temporal patterns. In: Matoušek R (ed) Recent advances in soft computing. ICSC-MENDEL 2016. Advances in intelligent systems and computing, vol 576. Springer, Cham
Radhakrishna V, Kumar PV, Janaki V, Rajasekhar N (2017) Estimating prevalence bounds of temporal association patterns to discover temporally similar patterns. In: Matoušek R (ed) Recent advances in soft computing. ICSC-MENDEL 2016. Advances in intelligent systems and computing, vol 576. Springer, Cham. https://doi.org/10.1007/978-3-319-58088-3_20
Ramaswamy S, Mahajan S, Silberschatz A (1998) On the discovery of interesting patterns in association rules. In: Gupta A, Shmueli O, Widom J (eds) Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB ‘98). Morgan Kaufmann Publishers Inc., San Francisco, pp 368–379
Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21th international conference on very large data bases (VLDB ‘95). Morgan Kaufmann Publishers Inc., San Francisco, pp 407–419
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data (SIGMOD ‘96). ACM, New York, pp 1–12. https://doi.org/10.1145/233269.233311
Srikant R, Agrawal R (1997) Mining generalized association rules. Futur Gener Comput Syst 13(2):161–180. https://doi.org/10.1016/S0167-739X(97)00019-8
Tung AKH, Ng RT, Lakshmanan LVS, Han J (2001) Constraint-based clustering in large databases. In: Proceedings of the 8th international conference on database theory (ICDT ‘01). Springer, Verlag, 405–419
Radhakrishna V, Aljawarneh SA, Kumar PV, Choo KKR (2016) A novel fuzzy gaussian-based dissimilarity measure for discovering similarity temporal association patterns. Soft Comput: 1–17. https://doi.org/10.1007/s00500-016-2445-y
Villafane R, Hua KA, Tran D, Maulik B (1999) Mining interval time series. In: Mohania M, Tjoa AM (eds) DataWarehousing and Knowledge Discovery. DaWaK 1999. Lecture Notes in Computer Science, vol 1676. Springer, Berlin https://doi.org/10.1007/3-540-48298-9_34
Yang C, Fayyad U, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘01). ACM, New York, 194–203. https://doi.org/10.1145/502512.502539
Yoo JS (2012) Temporal data mining: similarity-profiled association pattern. In: Holmes DE, Jain LC (eds) Data mining: foundations and intelligent paradigms. Intelligent systems reference library, vol 23. Springer, Berlin https://doi.org/10.1007/978-3-642-23166-7_3
Yoo JS, Shekhar S (2008) Mining Temporal Association Patterns under a Similarity Constraint. In: Ludäscher B, Mamoulis N (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin https://doi.org/10.1007/978-3-540-69497-7_26
Yoo JS, Shekhar S (2009) Similarity-Profiled Temporal Association Mining. IEEE Trans Knowl Data Eng 21(8):1147–1161 https://doi.org/10.1109/TKDE.2008.185
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390 https://doi.org/10.1109/69.846291
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘03). ACM, New York, p 326–335. https://doi.org/10.1145/956750.956788
Zhuang DEH, Li GCL, Wong AKC (2014) Discovery of temporal associations in multivariate time series. IEEE Trans Knowl Data Eng 26(12):2969–2982 https://doi.org/10.1109/TKDE.2014.2310219