Link Prediction on Complex Networks: An Experimental Survey
Data Science and Engineering - 2022
Tóm tắt
Complex networks have been used widely to model a large number of relationships. The outbreak of COVID-19 has had a huge impact on various complex networks in the real world, for example global trade networks, air transport networks, and even social networks, known as racial equality issues caused by the spread of the epidemic. Link prediction plays an important role in complex network analysis in that it can find missing links or predict the links which will arise in the future in the network by analyzing the existing network structures. Therefore, it is extremely important to study the link prediction problem on complex networks. There are a variety of techniques for link prediction based on the topology of the network and the properties of entities. In this work, a new taxonomy is proposed to divide the link prediction methods into five categories and a comprehensive overview of these methods is provided. The network embedding-based methods, especially graph neural network-based methods, which have attracted increasing attention in recent years, have been creatively investigated as well. Moreover, we analyze thirty-six datasets and divide them into seven types of networks according to their topological features shown in real networks and perform comprehensive experiments on these networks. We further analyze the results of experiments in detail, aiming to discover the most suitable approach for each kind of network.
Từ khóa
Tài liệu tham khảo
Amaral LA, Ottino JM (2004) Complex networks. Eur Phys J B 38(2):147–162
Lü L, Zhou T (2011) Link prediction in complex networks: A survey. Physica A 390(6):1150–1170
Lü L, Jin C-H, Zhou T (2009) Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80(4):046122
Uchida M, Shirayama S (2007) Formation of patterns from complex networks. J Visual 10(3):253–255
Liu Z, He J-L, Srivastava J (2013) Cliques in complex networks reveal link formation and community evolution. Comput Sci
Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90
Lei C, Ruan J (2012) A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity. Bioinformatics 29(3):355–364
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, Von Mering C, et al (2012) String v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucl Acids Res 41(D1):D808–D815
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP et al (2015) String v10: protein-protein interaction networks, integrated over the tree of life. Nucl Acids Res 43(D1):D447–D452
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P et al (2016) The string database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucl Acids Res gkw937
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031
Bonchi F, Castillo C, Gionis A, Jaimes A (2011) Social network analysis and mining for business applications. ACM Trans Intell Syst Technol 2(3):22
Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230
Chen H, Li X, Huang Z (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, pp 141–142
Daud NN, Hamid S, Saadoon M, Sahran F, Anuar NB (2020) Applications of link prediction in social networks: a review. J Netw Comput Appl 166:102716
Nandi G, Das A (2013) A survey on using data mining techniques for online social network analysis. Int J Comput Sci Iss 10(6):162
Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. Soc Netw Data Analyt 243–275
Vinupriya A, Gomathi S (2016) Web page personalization and link prediction using generalized inverted index and flame clustering. In: Proceedings of the 2016 international conference on computer communication and informatics, pp 1–8
Heemakshi Malhi MA (2016) A survey of various link prediction algorithms in complex networks. Int J Comput Sci Mob Comput 5(6):244–250
Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. ACM Comput Surv (CSUR) 49(4):69
Al Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. Proceedings of SDM Workshop on Link Analysis Couterterrorism & Security 30(9):798–805
Perozzi B, Alrfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
Zhao P, Aggarwal C, He G (2016) Link prediction in graph streams. In: Proceedings of the 32nd IEEE International Conference on Data Engineering, pp 553–564
Yao L, Wang L, Pan L, Yao K (2016) Link prediction based on common-neighbors for dynamic social network. Procedia Comput Sci 83:82–89
Salton G, McGill MJ (1986) Introduction to modern information retrieval. MuGraw-Hill, Auckland
Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Societe Vaudoise des Science Naturelles 37(142):547–579
Sorensen TA (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biologiske Skrifter 5:1–34
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555
Zhou T, Lü L, Zhang Y-C (2009) Predicting missing links via local information. Eur Phys J B 71(4):623–630
Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73(2):026120
Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1(2):226–251
Liu Z, Zhang Q-M, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve bayes model. Europhys Lett 96(4):48007
Stojmirović A, Yu Y-K (2007) Information flow in interaction networks. J Comput Biol 14(8):1115–1143
Sun D, Zhou T, Liu J-G, Liu R-R, Jia C-X, Wang B-H (2009) Information filtering based on transferring similarity. Phys Rev E 80(1):017101
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43
Kronecker L (1882) Grundzüge einer arithmetischen theorie der algebraischen grössen... von L. Kronecker, G. Reimer
Liu W, Lü L (2010) Link prediction based on local random walk. Europhys Lett 89(5):58007–58012(6)
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the International Conference on World Wide Web, pp 107–117
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 538–543
Chebotarev P, Shamis E (1997) The matrix-forest theorem and measuring relations in small social groups. Autom Rem Control 58(9):1505–1514
Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. In: Proceedings of the National Academy of Science, pp 22073–22078
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications
Friedman N, Getoor L, Koller D, Pfeffer A (2001) Learning probabilistic relational models with structural uncertainty. In: Proceedings of the International Conference on Machine Learning
Jaeger M (1997) Relational bayesian networks. In: Proceedings of the 13th Conference on Uncertainty in Artifical Intelligence, pp 266–273
Taskar B, Abbeel P,Wong M-F, Koller D (2007) Relational Markov networks. Introduction to Statistical Relational Learning 175–200
Neville J, Jensen D (2001) Relational dependency networks. J Mach Learn Res 8(2):653–692
Clauset A, Moore C, Newman ME (2008) Hierarchical sturcture and the prediction of missing links in networks. Nature 453(7191):98
Yu K, Chu W, Yu S, Tresp V, Xu Z (2006) Stochastic relational models for discriminative link prediction. In: Proceedings of the international conference on neural information processing systems. pp 1553–1560
Huang Z Link prediction based on graph topology: the predictive value of generalized clustering coefficient. Social Science Electronic Publishing
Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the IEEE international conference on data mining, pp 322–331
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Rish I et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, pp 41–46
Hosmer Jr D W, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley
Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Networks 6(1):51–63
Lichtenwalter RN, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 243–252
De Sá HR, Prudêncio RB (2011) Supervised link prediction in weighted networks. In: Proceedings of the International Joint Conference on Neural Networks, pp 2281–2288
Doppa JR, Yu J, Tadepalli P, Getoor L (2009) Chance-constratined programs for link prediction. In: Proceedings of the NIPS Workshop on Analyzing Networks and Learning with Graphs
Chen Y-L, Chen M-S, Philip SY (2015) Ensemble of diverse sparsifications for link prediction in large-scale networks. In: Proceedings of the IEEE International Conference on Data Mining, pp 51–60
Kashima H, Kato T, Yamanishi Y, Sugiyama M, Tsuda K (2009) Link propagation: a fast semi-supervised learning algorithm for link prediction. In: Proceedings of the 2009 SIAM international conference on data mining, pp 1100–1111
Raymond R, Kashima H (2010) Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs. In: Proceedings of the joint European conference on machine learning and knowledge discovery in databases, pp 131–147
Zeng Z, Chen K-J, Zhang S, Zhang H (2013) A link prediction approach using semi-supervised learning in dynamic networks. In: Proceedings of the Internation Conferennce on Advanced Computational Intelligence, pp 276–280
Cui P, Wang X, Pei J, Zhu W (2019) A survey on network embedding. IEEE Trans Knowl Data Eng 31(5):833–852
Zhang D, Yin J, Zhu X, Zhang C (2020) Network representation learning: a survey. IEEE Trans Big Data 6(1):3–28. https://doi.org/10.1109/TBDATA.2018.2850013
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37
Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 437–452
Chen B, Li F, Chen S, Hu R, Chen L (2017) Link prediction based on non-negative matirx factorization. PLoS ONE 12(8):e0182968
Shaosheng Cao QX, Lu Wei (2015) Grarep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp 891–900
Chen G, Wang H, Fang Y, Jiang L (2022) Link prediction by deep non-negative matrix factorization. Expert Syst Appl 188:115991
Duan L, Ma S, Aggarwal C, Ma T, Huai J (2017) An ensemble approach to link prediction. IEEE Trans Knowl Data Eng 29(11):2402–2416
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 855–864
Leonardo DR, Ribeiro FR, Savarese Pedro HP (2017) struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 385–394
Yao X, Shao Y, Cui B, Chen L (2021) Uninet: scalable network representation learning with metropolis-hastings sampling. In: Proceedings of the 37th IEEE international conference on data engineering, pp 516–527
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2019) Graph neural networks: a review of methods and applications. arXiv:1812.08434
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 30:1
Zhang M, Chen Y (2017) Weisfeiler-Lehman neural machine for link prediction. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 575–583
Mahjoub AB, Atri M (2019) An efficient end-to-end deep learning architecture for activity classification. Analog Integrated Circuits Signal Process 99:23–32. https://doi.org/10.1007/s10470-018-1306-2
Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: Advances in Neural Information Processing Systems, pp 5165–5175
Chiang W-L, Liu X, Si S,Li Y, Bengio S, Hsieh C-J (2019) Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 257–266
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903
Cai L, Ji S (2020) A multi-scale approach for graph link prediction. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020, pp 3308–3315
Ragunathan K, Selvarajah K, Kobti Z (2020) Link prediction by analyzing common neighbors based subgraphs using convolutional neural network. In: the 24th European Conference on Artificial Intelligence, vol 325, pp 1906–1913
Guo Z, Wang F, Yao K, Liang J, Wang Z (2022) Multi-scale variational graph autoencoder for link prediction. In: Proceedings of the 15th ACM International Conference on Web Search and Data Mining, pp 334–342
Jian Tang M W e a , Qu Meng (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp 1067–1077
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, Curran Associates Inc., Red Hook, NY, USA, pp 3111–3119
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1225–1234
Ren-Meng C, Liu S-Y, Xu X-K (2019) Network embedding for link prediction: the pitfall and improvement. Chaos: Interdiscip J Nonlinear Sci 29:103102. https://doi.org/10.1063/1.5120724
Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: Versatile graph embeddings from similarity measures. In: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 539–548. https://doi.org/10.1145/3178876.3186120
Newman ME (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102
Zhao Z, Gou Z, Du Y, Ma J, Li T, Zhang R (2022) A novel link prediction algorithm based on inductive matrix completion. Expert Syst Appl 188:116033
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442
Kunegis J (2013) http://konect.cc/statistics/clusco#b736
Newman ME (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701
Newman ME (2005) Power laws, pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351
Kunegis J (2013) http://konect.cc/statistics/power
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–40
Ripeanu M, Foster I (2002) Mapping the gnutella network: macroscopic properties of large-scale peer-to-peer systems. In: International Workshop on Peer-to-Peer systems. Springer, pp 85–93
Boyce DE, Chon KS, Ferris M, Lee YJ, Lin K, Eash R (1985) Implementation and evaluation of combined models of urban travel and location on a sketch planning network. Chicago Area Transportation Study
Kunegis J (2013) http://konect.cc/networks/subelj_euroroad
Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Networks 3(32):245–251
Opsahl T (2011) Why anchorage is not that important : Binary ties and sample selections. https://toreopsahl.com/2011/08/12/why-anchorage-is-not-that-important-binary-ties-and-sample-selection/
Kunegis J (2013) http://konect.cc/networks/chess
Kunegis J (2013) http://konect.cc/networks/moreno_crime
Opsahl T (2011) Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Networks 35(2):159–167
Ewing RM, Chu P, Elisma F, Li H, Taylor P, Climie S, McBroom-Cerajewski L, Robinson MD, O’Connor L, Li M et al (2007) Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol 3(1):89
Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S et al (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968
Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437(7062):1173–1178
Han J-DJ, Dupuy D, Bertin N, Cusick ME, Vidal M (2005) Effect of sampling on topology predictions of protein–protein interaction networks. Nat Biotechnol 23(7):839–844
Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23(4):261–283
Isella L, Stehlé J, Barrat A, Cattuto C, Pinton J-F, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180
Gleiser PM, Danon L (2003) Community structure in Jazz. Adv Complex Syst 6(4):565–573
Coleman J, Katz E, Menzel H (1957) The diffusion of an innovation among physicians. Sociometry 20(4):253–270
Freeman LC, Webster CM, Kirke DM (1998) Exploring social structure using dynamic three-dimensional color images. Soc Networks 20(2):109–118
Massa P, Salvetti M, Tomasoni D (2009) Bowling alone and trust decline in social network sites. In: Proceedings of the IEEE International Conference on Dependable, Autonomic and Secure Computing, pp 658–663
Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: User movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1082–1090
Kunegis J (2013) http://konect.cc/networks/dnc-corecipient
Zafarani R, Liu H (2009) Social computing data repository at asu. http://socialcomputing.asu.edu
Richardson M, Agrawal R, Domingos P (2003) Trust management for the semantic web. In: The Semantic Web-ISWC 2003, Springer, pp 351–368
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Advances in neural information processing systems, pp 539–547
Kunegis J (2013) http://konect.cc/networks/petster-friendships-hamster
Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E Stat Nonlinear Soft Matter Phys 70(5):056122
Kunegis J, Preusse J (2012) Fairness on the web: alternatives to the power law. In: Proceedings of the 4th Annual ACM Web Science Conference, pp 175–184
Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305
https://en.wikipedia.org/wiki/Algebraic_connectivity (2020)
Newman ME (2004) Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the National Academy of Sciences, pp 5200–5205
Šubelj L, Bajec M (2011) Robust network community detection using balanced propagation. Eur Phys J B 81(3):353–362
Ji J, Zhang A, Liu C, Quan X, Liu Z (2014) Survey: functional module detection from protein–protein interaction networks. In: Proceedings of the IEEE Transactions on Knowledge and Data Engineer, pp 261–277
Leskovec J, Krevl A (2014) SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data
Kunegis J (2013) KONECT—The Koblenz Network Collection. In: Proceedings of International Conference on World Wide Web Companion, pp 1343–1350. http://dl.acm.org/citation.cfm?id=2488173
Tang J (2006) Aminer dataset. https://www.aminer.cn/data/ (May )
Domenico MD (2017) Datasets released for reproducibility. https://manliodedomenico.com/data.php
Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/
Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. http://networkrepository.com
Knight S (2011) The internet topology zoo. http://topology-zoo.org/dataset.html
Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using networkx, Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States)
Csardi G, Nepusz T et al (2006) The igraph software package for complex network research. InterJ Complex Syst 1695(5):1–9
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) statnet: Software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the international AAAI conference on web and social media, vol 3, pp 361–362
De Domenico M, Porter MA, Arenas A (2015) Muxviz: a tool for multilayer analysis and visualization of networks. J Complex Netw 3(2):159–176
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36
Chakrabarti S, Khanna R, Sawant U, Bhattacharyya C (2008) Structured learning for non-smooth ranking losses. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, Association for Computing Machinery, New York, NY, USA, pp 88–96. https://doi.org/10.1145/1401890.1401906
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. Adv Neural Inf Process Syst 26:1
https://github.com/whxhx/Link-Prediction-Methods (2021)