MR-Radix: a multi-relational data mining algorithm

Carlos Roberto Valêncio1, Fernando Takeshi Oyama1, Paulo Scarpelini Neto1, Angelo Cesar Colombini2, Adriano Mauro Cansian1, Rogéria Cristiane Gratão de Souza1, Pedro Luiz Pizzigatti Corrêa3
1São Paulo State University - Unesp, Rua Cristóvão Colombo, São Paulo, Brazil
2Federal University of São Carlos - UFSCar, São Carlos, Brazil
3University of São Paulo - USP, São Paulo, São Paulo, Brazil

Tóm tắt

Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Tài liệu tham khảo

Kantardzic M: Data Mining: Concepts, Models, Methods and Algorithms. New Jersey: Wiley; 2003.

Dzeroski S, Raedt LD, Wrobel S: Multi-Relational Data Mining: Workshop Report. ACM SIGKDD 2003. Explorations Newsletter 2003,5(2):200–202. 10.1145/980972.981007

Page D, Craven M: Biological applications of multi-relational data mining. ACM SIGKDD Exploration Newsletter 2003,5(1):69–79. 10.1145/959242.959250

Habrard A, Bernard M, Jacquenet F: Multi-relational Data Mining in medical databases. Lecture notes in computer science. Springer 2003, 2780: 365–374.

Blockeel H, Dzeroski S: Multi-Relational Data Mining. Workshop Report. ACM SIGKDD Explorations Newsletter 2005,7(2):126–128. 10.1145/1117454.1117471

Wang K, Tang L, Han L, Liu J: Top Down FP-Growth for Association Rule Mining. Lecture notes in computer science. Springer 2002 2002, 2336: 334–340.

Gopalan R: Sucahyo Y (2004) High performance frequent patterns extraction using compressed FP-tree. SIAM International Workshop on High Performance and Distributed Mining, Orlando, USA: Proc; 2004.

Shang X: SQL Based Frequent Pattern Mining. Thesis (Ph.D.) 2005, 146.

Frequent Itemset Mining Implementations Repository Proc. IEEE Workshop on Frequent Itemset Mining Implementations (FIMI'04). Available at < fimi.cs.helsinki.fi/>. Accessed on 8th May, 2011 Proc. IEEE Workshop on Frequent Itemset Mining Implementations (FIMI'04). Available at < fimi.cs.helsinki.fi/>. Accessed on 8th May, 2011

Tsechansky MS, Pliskin N, Rabinowitz G, Porath A: "Mining Relational Patterns from Multiple Relational Tables". Decision Support Systems 1999,27(1–2):177–195. 10.1016/S0167-9236(99)00043-3

Ketkar NS, Holder LB, Cook DJ: Comparison of graph-based and logic-based multi-relational Data Mining. ACM SIGKDD Explorations Newsletter 2005,7(2):64–71. 10.1145/1117454.1117463

Inokuchi A, Washio T, Motoda H: An apriori-based algorithm for mining frequent substructures from graph data. Lecture notes in computer science. Springer 2000, 1910: 13–23.

Kuramochi M, Karypis G: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions On Knowledge and Data Engineering 2004,16(9):1038–1051. 10.1109/TKDE.2004.33

Matsuda T, Horiuchi T, Motoda H, Washio T: Extension of graph-based induction for general graph structured data. Lecture notes in computer science. Springer 2000, 1805: 420–431.

Ribeiro MX, Vieira MTP, Traina AJM: Mining Association Rules Using Clustering. I workshop on algorithms for data mining. Uberlândia, Brazil; 2005:9–16. [in Portuguese]

Pizzi L, Ribeiro M, Vieira M: Analysis of Hepatitis Dataset using Multirelational Association Rules. Proc. ECML/PKDD Discovery Challenge 2005.

Garcia E: Mining association rules from multi-relational quantities. In Thesis (Masters). Methodist University of Piracicaba; 2008:84. [in Portuguese]

Oyama FT: Extraction of knowledge in databases using multi-relational clustering tuples. Monograph (Undergraduate) São Paulo State University; 2006:51. [in Portuguese]

Pizzi LC: Data mining in multiple tables: GFP-Growth algorithm. In Thesis (Masters). Federal University of São Carlos; 2006:106. [in Portuguese]