Link Prediction on Complex Networks: An Experimental Survey

Haixia Wu1, Chunyao Song1, Yao Ge1, Tingjian Ge2
1College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, China
2University of Massachusetts Lowell, Massachusetts, United States

Tóm tắt

Complex networks have been used widely to model a large number of relationships. The outbreak of COVID-19 has had a huge impact on various complex networks in the real world, for example global trade networks, air transport networks, and even social networks, known as racial equality issues caused by the spread of the epidemic. Link prediction plays an important role in complex network analysis in that it can find missing links or predict the links which will arise in the future in the network by analyzing the existing network structures. Therefore, it is extremely important to study the link prediction problem on complex networks. There are a variety of techniques for link prediction based on the topology of the network and the properties of entities. In this work, a new taxonomy is proposed to divide the link prediction methods into five categories and a comprehensive overview of these methods is provided. The network embedding-based methods, especially graph neural network-based methods, which have attracted increasing attention in recent years, have been creatively investigated as well. Moreover, we analyze thirty-six datasets and divide them into seven types of networks according to their topological features shown in real networks and perform comprehensive experiments on these networks. We further analyze the results of experiments in detail, aiming to discover the most suitable approach for each kind of network.

Từ khóa


Tài liệu tham khảo

Amaral LA, Ottino JM (2004) Complex networks. Eur Phys J B 38(2):147–162 Lü L, Zhou T (2011) Link prediction in complex networks: A survey. Physica A 390(6):1150–1170 Lü L, Jin C-H, Zhou T (2009) Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80(4):046122 Uchida M, Shirayama S (2007) Formation of patterns from complex networks. J Visual 10(3):253–255 Liu Z, He J-L, Srivastava J (2013) Cliques in complex networks reveal link formation and community evolution. Comput Sci Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90 Lei C, Ruan J (2012) A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity. Bioinformatics 29(3):355–364 Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, Von Mering C, et al (2012) String v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucl Acids Res 41(D1):D808–D815 Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP et al (2015) String v10: protein-protein interaction networks, integrated over the tree of life. Nucl Acids Res 43(D1):D447–D452 Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P et al (2016) The string database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucl Acids Res gkw937 Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031 Bonchi F, Castillo C, Gionis A, Jaimes A (2011) Social network analysis and mining for business applications. ACM Trans Intell Syst Technol 2(3):22 Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230 Chen H, Li X, Huang Z (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, pp 141–142 Daud NN, Hamid S, Saadoon M, Sahran F, Anuar NB (2020) Applications of link prediction in social networks: a review. J Netw Comput Appl 166:102716 Nandi G, Das A (2013) A survey on using data mining techniques for online social network analysis. Int J Comput Sci Iss 10(6):162 Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. Soc Netw Data Analyt 243–275 Vinupriya A, Gomathi S (2016) Web page personalization and link prediction using generalized inverted index and flame clustering. In: Proceedings of the 2016 international conference on computer communication and informatics, pp 1–8 Heemakshi Malhi MA (2016) A survey of various link prediction algorithms in complex networks. Int J Comput Sci Mob Comput 5(6):244–250 Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. ACM Comput Surv (CSUR) 49(4):69 Al Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. Proceedings of SDM Workshop on Link Analysis Couterterrorism & Security 30(9):798–805 Perozzi B, Alrfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining Zhao P, Aggarwal C, He G (2016) Link prediction in graph streams. In: Proceedings of the 32nd IEEE International Conference on Data Engineering, pp 553–564 Yao L, Wang L, Pan L, Yao K (2016) Link prediction based on common-neighbors for dynamic social network. Procedia Comput Sci 83:82–89 Salton G, McGill MJ (1986) Introduction to modern information retrieval. MuGraw-Hill, Auckland Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Societe Vaudoise des Science Naturelles 37(142):547–579 Sorensen TA (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biologiske Skrifter 5:1–34 Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555 Zhou T, Lü L, Zhang Y-C (2009) Predicting missing links via local information. Eur Phys J B 71(4):623–630 Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73(2):026120 Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1(2):226–251 Liu Z, Zhang Q-M, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve bayes model. Europhys Lett 96(4):48007 Stojmirović A, Yu Y-K (2007) Information flow in interaction networks. J Comput Biol 14(8):1115–1143 Sun D, Zhou T, Liu J-G, Liu R-R, Jia C-X, Wang B-H (2009) Information filtering based on transferring similarity. Phys Rev E 80(1):017101 Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43 Kronecker L (1882) Grundzüge einer arithmetischen theorie der algebraischen grössen... von L. Kronecker, G. Reimer Liu W, Lü L (2010) Link prediction based on local random walk. Europhys Lett 89(5):58007–58012(6) Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the International Conference on World Wide Web, pp 107–117 Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 538–543 Chebotarev P, Shamis E (1997) The matrix-forest theorem and measuring relations in small social groups. Autom Rem Control 58(9):1505–1514 Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. In: Proceedings of the National Academy of Science, pp 22073–22078 Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications Friedman N, Getoor L, Koller D, Pfeffer A (2001) Learning probabilistic relational models with structural uncertainty. In: Proceedings of the International Conference on Machine Learning Jaeger M (1997) Relational bayesian networks. In: Proceedings of the 13th Conference on Uncertainty in Artifical Intelligence, pp 266–273 Taskar B, Abbeel P,Wong M-F, Koller D (2007) Relational Markov networks. Introduction to Statistical Relational Learning 175–200 Neville J, Jensen D (2001) Relational dependency networks. J Mach Learn Res 8(2):653–692 Clauset A, Moore C, Newman ME (2008) Hierarchical sturcture and the prediction of missing links in networks. Nature 453(7191):98 Yu K, Chu W, Yu S, Tresp V, Xu Z (2006) Stochastic relational models for discriminative link prediction. In: Proceedings of the international conference on neural information processing systems. pp 1553–1560 Huang Z Link prediction based on graph topology: the predictive value of generalized clustering coefficient. Social Science Electronic Publishing Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the IEEE international conference on data mining, pp 322–331 Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585 Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674 Rish I et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, pp 41–46 Hosmer Jr D W, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Networks 6(1):51–63 Lichtenwalter RN, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 243–252 De Sá HR, Prudêncio RB (2011) Supervised link prediction in weighted networks. In: Proceedings of the International Joint Conference on Neural Networks, pp 2281–2288 Doppa JR, Yu J, Tadepalli P, Getoor L (2009) Chance-constratined programs for link prediction. In: Proceedings of the NIPS Workshop on Analyzing Networks and Learning with Graphs Chen Y-L, Chen M-S, Philip SY (2015) Ensemble of diverse sparsifications for link prediction in large-scale networks. In: Proceedings of the IEEE International Conference on Data Mining, pp 51–60 Kashima H, Kato T, Yamanishi Y, Sugiyama M, Tsuda K (2009) Link propagation: a fast semi-supervised learning algorithm for link prediction. In: Proceedings of the 2009 SIAM international conference on data mining, pp 1100–1111 Raymond R, Kashima H (2010) Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs. In: Proceedings of the joint European conference on machine learning and knowledge discovery in databases, pp 131–147 Zeng Z, Chen K-J, Zhang S, Zhang H (2013) A link prediction approach using semi-supervised learning in dynamic networks. In: Proceedings of the Internation Conferennce on Advanced Computational Intelligence, pp 276–280 Cui P, Wang X, Pei J, Zhu W (2019) A survey on network embedding. IEEE Trans Knowl Data Eng 31(5):833–852 Zhang D, Yin J, Zhu X, Zhang C (2020) Network representation learning: a survey. IEEE Trans Big Data 6(1):3–28. https://doi.org/10.1109/TBDATA.2018.2850013 Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37 Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 437–452 Chen B, Li F, Chen S, Hu R, Chen L (2017) Link prediction based on non-negative matirx factorization. PLoS ONE 12(8):e0182968 Shaosheng Cao QX, Lu Wei (2015) Grarep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp 891–900 Chen G, Wang H, Fang Y, Jiang L (2022) Link prediction by deep non-negative matrix factorization. Expert Syst Appl 188:115991 Duan L, Ma S, Aggarwal C, Ma T, Huai J (2017) An ensemble approach to link prediction. IEEE Trans Knowl Data Eng 29(11):2402–2416 Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 855–864 Leonardo DR, Ribeiro FR, Savarese Pedro HP (2017) struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 385–394 Yao X, Shao Y, Cui B, Chen L (2021) Uninet: scalable network representation learning with metropolis-hastings sampling. In: Proceedings of the 37th IEEE international conference on data engineering, pp 516–527 Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2019) Graph neural networks: a review of methods and applications. arXiv:1812.08434 Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 30:1 Zhang M, Chen Y (2017) Weisfeiler-Lehman neural machine for link prediction. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 575–583 Mahjoub AB, Atri M (2019) An efficient end-to-end deep learning architecture for activity classification. Analog Integrated Circuits Signal Process 99:23–32. https://doi.org/10.1007/s10470-018-1306-2 Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: Advances in Neural Information Processing Systems, pp 5165–5175 Chiang W-L, Liu X, Si S,Li Y, Bengio S, Hsieh C-J (2019) Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 257–266 Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903 Cai L, Ji S (2020) A multi-scale approach for graph link prediction. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020, pp 3308–3315 Ragunathan K, Selvarajah K, Kobti Z (2020) Link prediction by analyzing common neighbors based subgraphs using convolutional neural network. In: the 24th European Conference on Artificial Intelligence, vol 325, pp 1906–1913 Guo Z, Wang F, Yao K, Liang J, Wang Z (2022) Multi-scale variational graph autoencoder for link prediction. In: Proceedings of the 15th ACM International Conference on Web Search and Data Mining, pp 334–342 Jian Tang M W e a , Qu Meng (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp 1067–1077 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, Curran Associates Inc., Red Hook, NY, USA, pp 3111–3119 Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1225–1234 Ren-Meng C, Liu S-Y, Xu X-K (2019) Network embedding for link prediction: the pitfall and improvement. Chaos: Interdiscip J Nonlinear Sci 29:103102. https://doi.org/10.1063/1.5120724 Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: Versatile graph embeddings from similarity measures. In: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 539–548. https://doi.org/10.1145/3178876.3186120 Newman ME (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102 Zhao Z, Gou Z, Du Y, Ma J, Li T, Zhang R (2022) A novel link prediction algorithm based on inductive matrix completion. Expert Syst Appl 188:116033 Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256 Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442 Kunegis J (2013) http://konect.cc/statistics/clusco#b736 Newman ME (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701 Newman ME (2005) Power laws, pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351 Kunegis J (2013) http://konect.cc/statistics/power Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–40 Ripeanu M, Foster I (2002) Mapping the gnutella network: macroscopic properties of large-scale peer-to-peer systems. In: International Workshop on Peer-to-Peer systems. Springer, pp 85–93 Boyce DE, Chon KS, Ferris M, Lee YJ, Lin K, Eash R (1985) Implementation and evaluation of combined models of urban travel and location on a sketch planning network. Chicago Area Transportation Study Kunegis J (2013) http://konect.cc/networks/subelj_euroroad Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Networks 3(32):245–251 Opsahl T (2011) Why anchorage is not that important : Binary ties and sample selections. https://toreopsahl.com/2011/08/12/why-anchorage-is-not-that-important-binary-ties-and-sample-selection/ Kunegis J (2013) http://konect.cc/networks/chess Kunegis J (2013) http://konect.cc/networks/moreno_crime Opsahl T (2011) Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Networks 35(2):159–167 Ewing RM, Chu P, Elisma F, Li H, Taylor P, Climie S, McBroom-Cerajewski L, Robinson MD, O’Connor L, Li M et al (2007) Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol 3(1):89 Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S et al (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968 Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437(7062):1173–1178 Han J-DJ, Dupuy D, Bertin N, Cusick ME, Vidal M (2005) Effect of sampling on topology predictions of protein–protein interaction networks. Nat Biotechnol 23(7):839–844 Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23(4):261–283 Isella L, Stehlé J, Barrat A, Cattuto C, Pinton J-F, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180 Gleiser PM, Danon L (2003) Community structure in Jazz. Adv Complex Syst 6(4):565–573 Coleman J, Katz E, Menzel H (1957) The diffusion of an innovation among physicians. Sociometry 20(4):253–270 Freeman LC, Webster CM, Kirke DM (1998) Exploring social structure using dynamic three-dimensional color images. Soc Networks 20(2):109–118 Massa P, Salvetti M, Tomasoni D (2009) Bowling alone and trust decline in social network sites. In: Proceedings of the IEEE International Conference on Dependable, Autonomic and Secure Computing, pp 658–663 Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: User movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1082–1090 Kunegis J (2013) http://konect.cc/networks/dnc-corecipient Zafarani R, Liu H (2009) Social computing data repository at asu. http://socialcomputing.asu.edu Richardson M, Agrawal R, Domingos P (2003) Trust management for the semantic web. In: The Semantic Web-ISWC 2003, Springer, pp 351–368 Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Advances in neural information processing systems, pp 539–547 Kunegis J (2013) http://konect.cc/networks/petster-friendships-hamster Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E Stat Nonlinear Soft Matter Phys 70(5):056122 Kunegis J, Preusse J (2012) Fairness on the web: alternatives to the power law. In: Proceedings of the 4th Annual ACM Web Science Conference, pp 175–184 Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305 https://en.wikipedia.org/wiki/Algebraic_connectivity (2020) Newman ME (2004) Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the National Academy of Sciences, pp 5200–5205 Šubelj L, Bajec M (2011) Robust network community detection using balanced propagation. Eur Phys J B 81(3):353–362 Ji J, Zhang A, Liu C, Quan X, Liu Z (2014) Survey: functional module detection from protein–protein interaction networks. In: Proceedings of the IEEE Transactions on Knowledge and Data Engineer, pp 261–277 Leskovec J, Krevl A (2014) SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data Kunegis J (2013) KONECT—The Koblenz Network Collection. In: Proceedings of International Conference on World Wide Web Companion, pp 1343–1350. http://dl.acm.org/citation.cfm?id=2488173 Tang J (2006) Aminer dataset. https://www.aminer.cn/data/ (May ) Domenico MD (2017) Datasets released for reproducibility. https://manliodedomenico.com/data.php Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/ Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. http://networkrepository.com Knight S (2011) The internet topology zoo. http://topology-zoo.org/dataset.html Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using networkx, Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) Csardi G, Nepusz T et al (2006) The igraph software package for complex network research. InterJ Complex Syst 1695(5):1–9 Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) statnet: Software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548 Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the international AAAI conference on web and social media, vol 3, pp 361–362 De Domenico M, Porter MA, Arenas A (2015) Muxviz: a tool for multilayer analysis and visualization of networks. J Complex Netw 3(2):159–176 Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36 Chakrabarti S, Khanna R, Sawant U, Bhattacharyya C (2008) Structured learning for non-smooth ranking losses. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, Association for Computing Machinery, New York, NY, USA, pp 88–96. https://doi.org/10.1145/1401890.1401906 Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. Adv Neural Inf Process Syst 26:1 https://github.com/whxhx/Link-Prediction-Methods (2021)