ABACUS: frequent pAttern mining-BAsed Community discovery in mUltidimensional networkS

Data Mining and Knowledge Discovery - Tập 27 - Trang 294-320 - 2013
Michele Berlingerio1, Fabio Pinelli1, Francesco Calabrese1
1IBM Research, Dublin, Ireland

Tóm tắt

Community discovery in complex networks is the problem of detecting, for each node of the network, its membership to one of more groups of nodes, the communities, that are densely connected, or highly interactive, or, more in general, similar, according to a similarity function. So far, the problem has been widely studied in monodimensional networks, i.e. networks where only one connection between two entities may exist. However, real networks are often multidimensional, i.e., multiple connections between any two nodes may exist, either reflecting different kinds of relationships, or representing different values of the same type of tie. In this context, the problem of community discovery has to be redefined, taking into account multidimensional structure of the graph. We define a new concept of community that groups together nodes sharing memberships to the same monodimensional communities in the different single dimensions. As we show, such communities are meaningful and able to group nodes even if they might not be connected in any of the monodimensional networks. We devise frequent pAttern mining-BAsed Community discoverer in mUltidimensional networkS (ABACUS), an algorithm that is able to extract multidimensional communities based on the extraction of frequent closed itemsets from monodimensional community memberships. Experiments on two different real multidimensional networks confirm the meaningfulness of the introduced concepts, and open the way for a new class of algorithms for community discovery that do not rely on the dense connections among nodes.

Tài liệu tham khảo

Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 466:761–764 Balcan D, Colizza V, Goncalves B, Hu H, Ramasco JJ, Vespignani A (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci USA 106(51):21484–21489 Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Computational logic, pp 972–986 Benevenuto F, Rodrigues T, Cha M, Almeida VAF (2009) Characterizing user behavior in online social networks. In: Internet measurement conference, pp 49–62 Berlingerio M, Coscia M, Giannotti F (2011) Finding redundant and complementary communities in multidimensional networks. In: ACM conference on information and knowledge management, pp 2181–2184 Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D (2011a) Foundations of multidimensional network analysis. In: International conference on advances in social networks analysis and mining, pp 485–489 Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D (2011b) The pursuit of hubbiness: analysis of hubs in large multidimensional networks. J Comput Sci 2(3):223–237 Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D (2012) Multidimensional networks: foundations of structural analysis. In: World Wide Web, pp 1–27 Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. theory and experiment. J Stat Mech 10:P10008 Bonchi F, Lucchese C (2005) Pushing tougher constraints in frequent pattern mining. Adv Knowl Discov Data Mining 3518:173–202 Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. J Knowl Inf Syst 8(2):131–153 Borgelt C (2003) Efficient implementations of apriori and eclat. In: IEEE ICDM workshop on frequent item set mining implementations, p 90 Bringmann B, Zimmermann A (2007) The chosen few: on identifying valuable patterns. In: IEEE international conference on data mining, pp 63–72 Bringmann B, Berlingerio M, Bonchi F, Gionis A (2010) Learning and predicting the evolution of social networks. IEEE Intel Syst 25(4):26–35 Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: The European conference on machine learning and principles and practice of knowledge discovery in databases, pp 445–452 Calabrese F, Dahlem D, Gerber A, Paul D, Chen X, Rowland J, Rath C, Ratti C (2011) The connected states of America: quantifying social radii of influence. In: IEEE international conference on social computing (SocialCom) Cerf L, Besson J, Robardet C, Boulicaut JF (2009a) Closed patterns meet n-ary relations. ACM Trans Knowl Discov Data 3(1):1–36 Cerf L, Nguyen TBN, Boulicaut JF (2009b) Discovering relevant cross-graph cliques in dynamic networks. In: International symposium on methodologies for intelligent systems, pp 513–522 Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111 Cook DJ, Crandall AS, Singla G, Thomas B (2010) Detection of social interaction in smart spaces. Cybern Syst 41(2):90–104 Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174 Francisco AP, Baeza-Yates RA, Oliveira AL (2008) Clique analysis of query log graphs. In: String processing and information retrieval, pp 188–199 Goyal A, Bonchi F, Lakshmanan LV (2008) Discovering leaders from community actions. In: ACM conference on information and knowledge management, pp 499–508 Gregory S (2009) Finding overlapping communities using disjoint community detection algorithms. Complex Netw 2009:47–61 Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: IEEE international conference on data mining, pp 845–850 Huang Y, Sun L, Nie JY (2010) Query model refinement using word graphs. In: ACM conference on information and, knowledge management, pp 1453–1456 Leskovec J, Huttenlocher D, Kleinberg J (2010a) Predicting positive and negative links in online social networks. In: ACM international conference on World Wide Web, pp 641–650 Leskovec J, Lang KJ, Mahoney MW (2010b) Empirical comparison of algorithms for network community detection. In: ACM international conference on World Wide Web, pp 631–640 Mongiovì M, Singh AK, Yan X, Zong B, Psounis K (2012) Efficient multicasting for delay tolerant networks using graph indexing. In: IEEE international conference on computer communications, pp 1386–1394 Mougel PN, Rigotti C, Gandrillon O (2012) Finding collections of k-clique percolated components in attributed graphs. In: Advances in knowledge discovery and data mining, pp 181–192 Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328:876–878 Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 2:167–256 Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: Workshops of the IEEE international conference on data mining, pp 1120–1127 Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818 Papadimitriou S, Sun J, Faloutsos C, Yu PS (2008) Hierarchical, parameter-free community discovery. In: The European conference on machine learning and principles and practice of knowledge discovery in databases, pp 170–187 Pass G, Chowdhury A, Torgeson C (2006) A picture of search. In: ACM international conference on scalable, information systems, p 1 Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: ACM international conference on knowledge discovery and data mining, pp 350–354 Pei J, Han J, Lakshmanan LVS (2001) Mining frequent item sets with convertible constraints. In: IEEE international conference on data engineering, pp 433–442 Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036106. doi:10.1103/PhysRevE.76.036106 Rossetti G, Berlingerio M, Giannotti F (2011) Scalable link prediction on multidimensional networks. In: Workshops of the IEEE international conference on data mining, pp 979–986 Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105:1118–1123 Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: ACM international conference on knowledge discovery and data mining, pp 797–806 Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of large-scale social networks in an online world. Proc Natl Acad Sci USA 107(31):13636–13641 Tang L, Liu H (2009) Scalable learning of collective behavior based on sparse social dimensions. In: ACM conference on information and knowledge management, pp 1107–1116 Trasarti R, Pinelli F, Nanni M, Giannotti F (2011) Mining mobility user profiles for car pooling. In: ACM international conference on knowledge discovery and data mining, pp 1190–1198 Wang J, Zeng Z, Zhou L (2006) Clan: an algorithm for mining closed cliques from large dense graph databases. In: IEEE international conference on data engineering, p 73 Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: IEEE international conference on data mining, pp 721–724 Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: ACM international conference on knowledge discovery and data mining, pp 797–802