Simultaneous classification and community detection on heterogeneous network data

Data Mining and Knowledge Discovery - Tập 25 - Trang 420-449 - 2012
Prakash Mandayam Comar1, Pang-Ning Tan1, Anil K. Jain1
1Department of Computer Science & Engineering, Michigan State University, East Lansing, USA

Tóm tắt

Previous studies on network mining have focused primarily on learning a single task (such as classification or community detection) on a given network. This paper considers the problem of multi-task learning on heterogeneous network data. Specifically, we present a novel framework that enables one to perform classification on one network and community detection in another related network. Multi-task learning is accomplished by introducing a joint objective function that must be optimized to ensure the classes in one network are consistent with the link structure, nodal attributes, as well as the communities detected in another network. We provide both theoretical and empirical analysis of the framework. We also show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. Experiments performed on both real-world and synthetic data sets demonstrate the effectiveness of the joint framework compared to applying classification and community detection algorithms on each network separately.

Tài liệu tham khảo

Banerjee S, Ramanathan K, Gupta A (2007) Clustering short texts using Wikipedia. In: SIGIR, pp 787–788 Bollobás B (1998) Modern graph theory, graduate text in mathematics. Springer, New York Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases Caruana R (1997) Multitask learning. Mach Learn 28: 41–75 Chen F, Tan PN, Jain AK (2009) A co-classification framework for detecting web spam and spammers in social media web sites. In: CIKM Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD int’l conf on knowledge discovery and data mining. ACM Press, New York, pp 89–98 Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD. ACM Press, New York, pp 126–135 Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6: 615–637 Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 150–160 Flake GW, Lawrence S, Giles CL, Coetzee FM (2002) Self-organization and identification of web communities. IEEE Comput 35: 66–71 Ford L, Fulkerson D (1956) Maximal flow through a network. Can J Math 8: 399–404 Getoor L, Diehl CP (2005) Link mining: a survey. SIGKDD Explor Newsl 7: 3–12 Grineva M, Grinev M, Turdakov D, Velikhov P (2008) Harnessing Wikipedia for smart tags clustering. In: Proceedings of the international workshop on knowledge acquisition from the social web (KASW2008) Ho N-D (2008) Nonnegative matrix factorization algorithms and applications. PhD Thesis, Catholic University of Louvain, June 2008 Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD) Kato T, Kashima H, Sugiyama M (2009) Robust label propagation on multiple networks. IEEE Trans Neural Netw 20(1): 35–44 Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42(8): 30–37 Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: NIPS, vol 13, pp 556–562 Lin C-J (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6): 1589–1596 Lin Y-R, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) MetaFac: community discovery via relational hypergraph factorization. In: Proceedings of the 15th ACM SIGKDD int’l conf on knowledge discovery and data mining, pp 527–536 Long B, Zhang ZM, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the 11th ACM SIGKDD int’l conf on knowledge discovery and data mining. ACM Press, New York, pp 635–640 Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 470–479 Long B, Yu PS, Zhang Z (2008) A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM international conference on data mining Mandayam Comar P, Tan PN, Jain AK (2010a) Multi task learning on multiple related networks. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp 1737–1740 Mandayam Comar P, Tan P-N, Jain AK (2010b) Identifying cohesive subgroups and their correspondences in multiple related networks. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 01, pp 476–483 Mandayam Comar P, Tan P-N, Jain AK (2012) A framework for joint community detection across multiple related networks. Neurocomputing 76(1):93–104 Newman M (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103: 8577–8582 Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113 Senator TE (2005) Link mining applications: progress and challenges. SIGKDD Explor Newsl 7: 76–83 Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Proceedings of CVPR Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: IEEE international conference on data mining (ICDM ’09) Tang W, Lu Z, Dhillon I (2009b) Clustering with multiple graphs. In: ICDM, Miami, pp 1016–1021 Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08 Wang P, Domeniconi C, Hu J (2008) Using Wikipedia for co-clustering based cross-domain text classification. In: Proceedings of the 2008 eighth IEEE international conference on data mining, pp 1085–1090 Wei Y-C, Cheng C-K (1989) Towards efficient hierarchical designs by ratio cut partitioning. In: IEEE International Conference on Computer-Aided Design, 1989. ICCAD-89, pp 298–301. doi:10.1109/ICCAD.1989.76957 Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with Dirichlet process priors. J Mach Learn Res 8: 35–63 Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM int’l conf on knowledge discovery and data mining, pp 927–936 Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, MIT Press, pp 321–328 Zhou D, Zhu S, Yu K, Song X, Tseng BL, Zha H, Giles CL (2008) Learning multiple graphs for document recommendations. In: WWW ’08, pp 141–150 Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. CMU