Supervised methods for multi-relational link prediction
Tóm tắt
Many important real-world systems, modeled naturally as complex networks, have heterogeneous interactions and complicated dependency structures. Link prediction in such networks must model the influences between heterogenous relationships and distinguish the formation mechanisms of each link type, a task which is beyond the simple topological features commonly used to score potential links. In this paper, we introduce a novel probabilistically weighted extension of the Adamic/Adar measure for heterogenous information networks, which we use to demonstrate the potential benefits of diverse evidence, particularly in cases where homogeneous relationships are very sparse. However, we also expose some fundamental flaws of traditional unsupervised link prediction. We develop supervised learning approaches for relationship (link) prediction in multi-relational networks, and demonstrate that a supervised approach to link prediction can enhance performance. We present results on three diverse, real-world heterogeneous information networks and discuss the trends and tradeoffs of supervised and unsupervised link prediction in a multi-relational setting.
Tài liệu tham khảo
Adamic L, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230
Barabási A (2003) Linked: how everything is connected to everything else and what it means. Penguin Group, New York
Barabási A, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Phys A Stat Mech Appl 311(3-4):590–614
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Christakis N, Fowler J (2007) The spread of obesity in a large social network over 32 years. New Engl J Med 357(4):370–379
Christakis N, Fowler J (2008) The collective dynamics of smoking in a large social network. New Engl J Med 358(21):2249–2258
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning ACM, pp 233–240
Friedkin N, Johnsen E (1990) Social influence and opinions. J Math Sociol 15(3):193–206
Goadrich M, Oliphant L, Shavlik J (2004) Learning ensembles of first-order clauses for recall-precision curves: a case study in biomedical information extraction. In: Inductive logic programming, pp 421–456
Han J (2009) Mining heterogeneous information networks by exploring the power of links. In: Discovery science, Springer, pp 13–30
Holland P, Leinhardt S (1970) A method for detecting structure in sociometric data. Am J Sociol 76(3):492–513
Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, ACM, pp 141–142
Kas M, Carley K, Carley L (2012) Trends in science networks: understanding structures and statistics of scientific networks. In: Social network analysis and mining, pp 1–19
Kautz H, Selman B, Shah M (1997) Referral Web: combining social networks and collaborative filtering. Commun ACM 40(3):63–65
Liben-Nowell D, Kleinberg J (2003) The link-prediction problem for social networks. In: Proceedings of the 12th international conference on information and knowledge management, pp 556–559
Lichtenwalter R, Lussier J, Chawla N (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 243–252
McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: Homophily in social networks. Ann Rev Sociol 27:415–444
Newman M (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102
Newman M (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98(2):404
O’Madadhain J, Hutchins J, Smyth P (2005) Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explor Newslett 7(2):23–30
Pelan A, Steinhaeuser K, Chawla NV, de Alwis Pitts DA, Ganguly AR (2011) Empirical comparison of correlation measures and pruning levels in complex networks representing the global climate system. In: IEEE symposium series on computational intelligence and data mining
Pržulj N, Corneil D, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508
Radivojac P, Peng K, Clark W, Peters B, Mohan A, Boyle S, Mooney S (2008) An integrated approach to inferring gene–disease associations in humans. Proteins Struct Funct Bioinform 72(3):1030–1037
Raeder T, Chawla N (2011) Market basket analysis with networks. In: Social network analysis and mining, pp 1–17
Raghavan V, Bollmann P, Jung G (1989) A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans Inform Syst (TOIS) 7(3):205–229
Rual J, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz G, Gibbons F, Dreze M, Ayivi-Guedehoussou N et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437(7062):1173–1178
Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill, New York
Scott J (2011) Social network analysis: developments, advances, and prospects. In: Social network analysis and mining, pp 1–6
Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck F, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S et al (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968
Tang L, Wang X, Liu H (2009) Uncovering groups via heterogeneous interaction analysis. In: Proceedings of the 9th IEEE international conference on data mining, pp 503–512
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge university press, Cambridge