Susceptible-infected-spreading-based network embedding in static and temporal networks

Springer Science and Business Media LLC - Tập 9 - Trang 1-20 - 2020
Xiu-Xiu Zhan1, Ziyu Li1, Naoki Masuda2,3, Petter Holme4, Huijuan Wang1
1Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
2Department of Mathematics, University at Buffalo, State University of New York, Buffalo, New York, USA
3Computational and Data-Enabled Science and Engineering Program, University at Buffalo, State University of New York, Buffalo, New York, USA
4Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan

Tóm tắt

Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.

Tài liệu tham khảo

Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256 Zhang Z-K, Liu C, Zhan X-X, Lu X, Zhang C-X, Zhang Y-C (2016) Dynamics of information diffusion and its applications on complex networks. Phys Rep 651:1–34 Costa LdF, Oliveira ON Jr, Travieso G, Rodrigues FA, Villas Boas PR, Antiqueira L, Viana MP, Correa Rocha LE (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60(3):329–412 Qi Y, Bar-Joseph Z, Klein-Seetharaman J (2006) Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins, Struct Funct Bioinform 63(3):490–500 Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826 Jacob Y, Denoyer L, Gallinari P (2014) Learning latent representations of nodes for classifying in heterogeneous social networks. In: Proceedings of the 7th ACM international conference on web search and data mining. ACM, New York, pp 373–382 Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A, Stat Mech Appl 391(16):4165–4180 Wang H, Li Q, D’Agostino Gregorio, Havlin S, Stanley HE, Van Mieghem P (2013) Effect of the interconnected network structure on the epidemic threshold. Phys Rev E 88(2):022801 Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031 Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A, Stat Mech Appl 390(6):1150–1170 Lü L, Medo M, Yeung CH, Zhang Y-C, Zhang Z-K, Zhou T (2012) Recommender systems. Phys Rep 519(1):1–49 Martínez V, Berzal F, Cubero J-C (2017) A survey of link prediction in complex networks. ACM Comput Surv 49(4):69 Liu C, Ma Y, Zhao J, Nussinov R, Zhang Y-C, Cheng F, Zhang Z-K (2020) Computational network biology: data, model, and applications. Phys Rep 846:1–66 Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newsl 7(2):3–12 Cui P, Wang X, Pei J, Zhu W (2019) A survey on network embedding. IEEE Trans Knowl Data Eng 31(5):833–852 Wang X, Cui P, Wang J, Pei J, Zhu W, Yang S (2017) Community preserving network embedding. In: Thirty-first AAAI conference on artificial intelligence Pandhre S, Mittal H, Gupta M, Balasubramanian VN (2018) Stwalk: learning trajectory representations in temporal graphs. In: Proceedings of the ACM India joint international conference on data science and management of data, pp 210–219 Béres F, Kelen DM, Pálovics R, Benczúr AA (2019) Node embeddings in dynamic graphs. Appl Netw Sci 4(1):64 Sato K, Oka M, Barrat A, Cattuto C (2019) Dyane: dynamics-aware node embedding for temporal networks. arXiv preprint. arXiv:1909.05976 Torricelli M, Karsai M, Gauvin L (2020) weg2vec: event embedding for temporal networks. Sci Rep 10(1):1–11 Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323 Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326 Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, pp 585–591 Golub GH, Reinsch C (1971) Singular value decomposition and least squares solutions pp 134–151 Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 701–710 Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077. International World Wide Web Conferences Steering Committee Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 855–864 Cao Z, Wang L, de Melo G (2018) Link prediction via subgraph embedding-based convex matrix completion. In: Proceedings of the 32nd AAAI conference on artificial intelligence (AAAI 2018). AAAI Press, Menlo Park Zhang Y, Shi Z, Feng D, Zhan X-X (2019) Degree-biased random walk for large-scale network embedding. Future Gener Comput Syst 100:198–209 Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Continuous-time dynamic network embeddings. In: Companion of the web conference 2018 on the web conference 2018, pp 969–976. International World Wide Web Conferences Steering Committee Yuan S, Wu X, Xiang Y (2017) Sne: signed network embedding. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 183–195 Bagavathi A, Krishnan S (2018) Multi-net: a scalable multiplex network embedding framework. In: International conference on complex networks and their applications. Springer, Berlin, pp 119–131 Qu C, Zhan X-X, Wang G, Wu J, Zhang Z-K (2019) Temporal information gathering process for node ranking in time-varying networks. Chaos, Interdiscip J Nonlinear Sci 29(3):033116 Isella L, Stehlé J, Barrat A, Cattuto C, Pinton J-F, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180 Michalski R, Palus S, Kazienko P (2011) Matching organizational structure and social network extracted from email communication. In: International conference on business information systems. Springer, Berlin, pp 197–206 Chaintreau A, Hui P, Crowcroft J, Diot C, Gass R, Scott J (2007) Impact of human mobility on opportunistic forwarding algorithms. IEEE Trans Mob Comput 6:606–620 Opsahl T (2013) Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Netw 35(2):159–167 DNC emails network dataset—KONECT (2017) http://konect.uni-koblenz.de/networks/dnc-temporalGraph Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Soc Netw 31(2):155–163 Singer U, Guy I, Radinsky K (2019) Node embedding over temporal graphs. arXiv preprint. arXiv:1903.08889 Zuo Y, Liu G, Lin H, Guo J, Hu X, Wu J (2018) Embedding temporal network via neighborhood formation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, New York, pp 2857–2866 Zhou L, Yang Y, Ren X, Wu F, Zhuang Y (2018) Dynamic network embedding by modeling triadic closure process. In: Thirty-second AAAI conference on artificial intelligence Gao M, Chen L, He X, Zhou A (2018) Bine: bipartite network embedding. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 715–724 Kovács IA, Luck K, Spirohn K, Wang Y, Pollis C, Schlabach S, Bian W, Kim D-K, Kishore N, Hao T et al. (2019) Network-based prediction of protein interactions. Nat Commun 10(1):1–8 Cao R-M, Liu S-Y, Xu X-K (2019) Network embedding for link prediction: the pitfall and improvement. Chaos, Interdiscip J Nonlinear Sci 29(10):103102 Yin Z, Shen Y (2018) On the dimensionality of word embedding. In: Advances in neural information processing systems, pp 887–898 Zhan X-X, Hanjalic A, Wang H (2019) Information diffusion backbones in temporal networks. Sci Rep 9(1):6798 Zhan X-X, Liu C, Zhou G, Zhang Z-K, Sun G-Q, Zhu JJ, Jin Z (2018) Coupling dynamics of epidemic spreading and information diffusion on complex networks. Appl Math Comput 332:437–448 Wang S, Tang J, Aggarwal C, Chang Y, Liu H (2017) Signed network embedding in social media. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, Philadelphia, pp 327–335