Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery - Tập 23 Số 1 - Trang 169-214 - 2011
Jilles Vreeken1, Matthijs van Leeuwen1, Arno Siebes1
1Algorithmic Data Analysis, Department of Information and Computing Sciences, Faculty of Science, Universiteit Utrecht, Utrecht, The Netherlands

Tóm tắt

Từ khóa


Tài liệu tham khảo

Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining, AAAI, pp 307–328

Bathoorn R, Koopman A, Siebes A (2006) Reducing the frequent pattern set. In: Proceedings of the ICDM-workshops’06, pp 55–59

Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of SIGMOD’98, pp 85–93

Bringmann B, Zimmermann A (2007) The chosen few: on identifying valuable patterns. In: Proceedings of the ICDM’07, pp 63–72

Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceedings of the ECML PKDD’02, pp 74–85

Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. In: Proceedings of KDD’04, pp 79–88

Chakrabarti S, Sarawagi S, Dom B (1998) Mining surprising patterns using temporal description length. In: Proceedings of VLDB’98, Morgan Kaufmann, San Francisco, pp 606–617

Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12(3): 355–378

Coenen F (2003) The LUCS–KDD discretised/normalised ARM and CARM data library. http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/DataSets/dataSets.html

Coenen F (2004) The LUCS–KDD software library. http://www.csc.liv.ac.uk/~frans/KDD/Software

Cover T, Thomas J (2006) Elements of information theory, 2nd edn. John Wiley and Sons, New York

Crémilleux B, Boulicaut JF (2002) Simplest rules characterizing classes generated by δ-free sets. In: Proceedings of KBSAAI’02, pp 33–46

Duda R, Hart P (1973) Pattern classification and scene analysis. John Wiley and Sons, New York

Faloutsos C, Megalooikonomou V (2007) On data mining, compression and Kolmogorov complexity. Data Min Knowl Discov 15(1): 3–20

Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Proceedings of DS’04, pp 278–289

Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3): 14

Goethals B, Zaki MJ (2003) Frequent itemset mining implementations repository (FIMI). http://fimi.cs.helsinki.fi

Grünwald PD (2005) Minimum description length tutorial. In: Grünwald P, Myung I (eds) Advances in minimum description length. MIT Press, Cambridge

Grünwald PD (2007) The minimum description length principle. MIT Press, Cambridge

Hand, D, Adams, N, Bolton, R (eds) (2002) Pattern detection and discovery. Springer, New York

Heikinheimo H, Hinkkanen E, Mannila H, Mielikäinen T, Seppänen JK (2007) Finding low-entropy sets and trees from binary data. In: Proceedings of KDD’07, pp 350–359

Heikinheimo H, Vreeken J, Siebes A, Mannila H (2009) Low-entropy set selection. In: Proceedings of SDM’09, pp 569–579

Karp RM (1972) Reducibility among combinatorial problems. In: Miller R, Thatcher J (eds) Proceedings of a symposium on the complexity of computer computations. Plenum Press, New York, USA, pp 85–103

Keogh E, Lonardi S, Ratanamahatana CA (2004) Towards parameter-free data mining. In: Proceedings of KDD’04, pp 206–215

Keogh E, Lonardi S, Ratanamahatana CA, Wei L, Lee SH, Handley J (2007) Compression-based data mining of sequential data. Data Min Knowl Discov 14(1): 99–129

Knobbe AJ, Ho EKY (2006a) Maximally informative k-itemsets and their efficient discovery. In: Proceedings of KDD’06, pp 237–244

Knobbe AJ, Ho EKY (2006b) Pattern teams. In: Proceedings of the ECML PKDD’06, pp 577–584

Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explor 2(2):86–98. http://www.ecn.purdue.edu/KDDCUP

Koopman A, Siebes A (2008) Discovering relational items sets efficiently. In: Zaki M, Wang K (eds) Proceedings of SDM’08, SIAM, pp 108–119

Koopman A, Siebes A (2009) Characteristic relational patterns. In: Proceedings of KDD’09, pp 437–446

Li M, Vitányi P (1993) An introduction to Kolmogorov complexity and its applications. Springer, New York

Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of KDD’98, pp 80–86

Liu G, Lu H, Yu JX, Wei W, Xiao X (2004) AFOPT: an efficient implementation of pattern growth approach. In: Proceedings of the 2nd workshop on frequent itemset mining implementations

Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proceedings of KDD’96, pp 189–194

Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data mining and knowledge discovery, pp 241–258

Mehta M, Agrawal R, Rissanen J (1996) Sliq: a fast scalable classifier for data mining. In: Advances in database technology. Springer, NY, pp 18–32

Meretakis D, Lu H, Wüthrich B (2000) A study on the performance of large bayes classifier. In: Proceedings of the ECML’00, pp 271–279

Mielikäinen T, Mannila H (2003) The pattern ordering problem. In: Proceedings of the ECML PKDD’03, pp 327–338

Mitchell-Jones AJ, Amori G, Bogdanowicz W, Krystufek B, Reijnders PJH, Spitzenberger F, Stubbe M, Thissen JBM, Vohralik V, Zima J (1999) The atlas of European mammals. Academic Press, London

Morik, K, Boulicaut, JF, Siebes, A (eds) (2005) Local pattern detection. Springer, New York

Myllykangas S, Himberg J, Böhling T, Nagy B, Hollmén J, Knuutila S (2006) Dna copy number amplification profiling of human neoplasms. Oncogene 25(55)

Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the ICDT’99, pp 398–416

Pfahringer B (1995) Compression-based feature subset selection. In: Proceedings of the IJCAI’95 workshop on data engineering for inductive learning, pp 109–119

Quinlan J (1993b) C4.5: programs for machine learning. Morgan-Kaufmann, Los Altos

Quinlan J (1993b) FOIL: a midterm report. In: Proceedings of the ECML’93

Rissanen J (1978) Modeling by shortest data description. Automatica 14(1): 465–471

Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of SDM’06, pp 393–404

Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) Graphscope: parameter-free mining of large time-evolving graphs. In: Proceedings of KDD’07, pp 687–696

Tatti N, Vreeken J (2008) Finding good itemsets by packing data. In: Proceedings of the ICDM’08, pp 588–597

van Leeuwen M, Siebes A (2008) Streamkrimp: detecting change in data streams. In: Proceedings of ECMLPKDD’08, Springer, Heidelberg, pp 672–687

van Leeuwen M, Vreeken J, Siebes A (2006) Compression picks the item sets that matter. In: Proceedings of the ECML PKDD’06, pp 585–592

van Leeuwen M, Vreeken J, Siebes A (2009) Identifying the components. Data Min Knowl Discov 19(2): 173–292

Vreeken J, Siebes A (2008) Filling in the blanks—Krimp minimisation for missing data. In: Proceedings of the ICDM’08, pp 1067–1072

Vreeken J, van Leeuwen M, Siebes A (2007a) Characterising the difference. In: Proceedings of KDD’07, pp 765–774

Vreeken J, van Leeuwen M, Siebes A (2007b) Preserving privacy through data generation. In: Proceedings of the ICDM’07, pp 685–690

Wallace C (2005) Statistical and inductive inference by minimum message length. Springer, New York

Wang J, Karypis G (2005) HARMONY: efficiently mining the best rules for classification. In: Proceedings of SDM’05, pp 205–216

Wang J, Karypis G (2006) On efficiently summarizing categorical databases. Knowl Inf Syst 9(1): 19–37

Wang C, Parthasarathy S (2006) Summarizing itemset patterns using probabilistic models. In: Proceedings of KDD’06, pp 730–735

Warner H, Toronto A, Veasey L, Stephenson R (1961) A mathematical model for medical diagnosis, application to congenital heart disease. J Am Med Assoc 177: 177–184

Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco

Xiang Y, Jin R, Fuhry D, Dragan FF (2008) Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In: Proceedings of KDD’08, pp 758–766

Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of VLDB’05, pp 709–720

Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Proceedings of KDD’05, pp 314–323

Yin X, Han J (2003) CPAR: Classification based on predictive association rules. In: Proceedings of SDM’03, pp 331–335

Zhang X, Guozhu D, Ramamohanarao K (2000) Information-based classification by aggregating emerging patterns. In: Proceedings of IDEAL’00, pp 48–53