ShortWalk: an approach to network embedding on directed graphs

Social Network Analysis and Mining - Tập 11 - Trang 1-12 - 2021
Fen Zhao1, Yi Zhang1, Jianguo Lu1
1School of Computer Science, University of Windsor, Windsor, Canada

Tóm tắt

In network embedding algorithms, long random walks are often used to convert the graph into ‘text’ so that node embeddings can be learned by skip-gram with negative sampling model. However, in a directed graph, long random walks can be trapped or interrupted, leading to low-quality embeddings. This paper proposes a new algorithm, called ShortWalk, to improve the directed graph network embeddings. ShortWalk performs short random walks that restart more frequently thus produces shorter traces. It also gives nodes equal weights by generating the training pairs using pair-wise combination of nodes on the traces. We validate our method on eight directed graphs with different sizes and structures. Experimental results confirm that ShortWalk outperforms DeepWalk consistently on all datasets in node classification and link prediction tasks.

Tài liệu tham khảo

Abu-El-Haija S, Perozzi B, Al-Rfou R (2017) Learning edge representations via low-rank asymmetric projections. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 1787–1796 Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, pp 585–591 Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(2003):1137–1155 Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117. https://doi.org/10.1016/S0169-7552(98)00110-X Cai H, Zheng VW, Chang KC-C (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637 Cao S, Lu W, Xu Q (2015) Grarep: learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900 Chen M, Yang Q, Tang X et al (2007) Directed graph embedding. In: IJCAI, pp 2707–2712 Dong Y, Chawla NV, Swami A (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 135–144 Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl-Based Syst 151:78–94. https://doi.org/10.1016/j.knosys.2018.03.022 Grolmusz V (2015) A note on the pagerank of undirected graphs. Inf Process Lett 115(6–8):633–634. https://doi.org/10.1016/j.ipl.2015.02.015 Grover A, Leskovec J (2016a) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 855–864 Grover A, Leskovec J (2016b) node2vec: scalable feature learning for networks. In: SIGKDD. ACM, pp 855–864 Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 297–304 Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36. https://doi.org/10.1148/radiology.143.1.7063747 Jacomy M, Venturini T, Heymann S, Bastian M (2014) ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS ONE 9(6):e98679. https://doi.org/10.1371/journal.pone.0098679 Khosla M, Leonhardt J, Nejdl W, Anand A (2019) Node representation learning for directed graphs. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 395–411 Kim J, Park H, Lee J-E, Kang U (2018) Side: representation learning in signed directed networks. In: Proceedings of the 2018 World Wide Web Conference, pp 509–518 Kunegis J (2013) KONECT: the Koblenz Network Collection. In: Proceedings of the 22nd international conference on World Wide Web (WWW ’13 Companion). ACM, Rio de Janeiro, Brazil, pp 1343–1350. https://doi.org/10.1145/2487788.2488173 Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection, June 2014 Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. https://doi.org/10.1002/asi.20591 Liu L, Cheung WK, Li X, Liao L (2016) Aligning users across social networks using network embedding. In: IJCAI, pp 1774–1780 Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instrum Comput 28(2):203–208. https://doi.org/10.3758/BF03204766 Man T, Shen H, Liu S, Jin X, Cheng X (2016) Predict anchor links across social networks via an embedding approach. IJCAI 16:1823–1829 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data Mining—KDD ’16. ACM Press, San Francisco, California, USA, pp 1105–1114. https://doi.org/10.1145/2939672.2939751 Palla G, Farkas IJ, Pollner P, Derényi I, Vicsek T (2007) Directed network modules. New J Phys 9(6):186–186. https://doi.org/10.1088/1367-2630/9/6/186 Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543 Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 701–710 Perozzi B, Kulkarni V, Skiena S (2016) Walklets: multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115 Řehuřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. ELRA, Valletta, Malta, pp 45–50 Sen T, Chaudhary DK (2017) Contrastive study of simple PageRank, HITS and weighted PageRank algorithms: review. In: 2017 7th international conference on cloud computing, data science engineering—Confluence, pp 721–727. https://doi.org/10.1109/CONFLUENCE.2017.7943245 Sun J, Bandyopadhyay B, Bashizade A, Liang J, Sadayappan P, Parthasarathy S (2018) ATP: directed graph embedding with asymmetric transitivity preservation. CoRR arXiv:1811.00839 Tang J, Meng Q, Wang M, Zhang M, Yan J, Mei Qaozhu (2015a) LINE: large-scale information network embedding. In: WWW. ACM, pp 1067–1077 Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015b) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. ACM, pp 1067–1077 Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323 Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: versatile graph embeddings from similarity measures. In: Proceedings of the 2018 World Wide Web Conference, pp 539–548 Van Der Maaten L (2014) Accelerating T-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245 Wang Y, Yao Y, Tong H, Feng X, Jian L (2020) A brief review of network embedding. Big Data Min Anal 2(1):35–47 Yang Z, Tang J, Cohen W (2015) Multi-modal Bayesian embeddings for learning social knowledge graphs. arXiv preprint arXiv:1508.00715 Zhang D, Yin J, Zhu X, Zhang C (2017) Network representation learning: a survey. arXiv:1801.05852 [cs, stat] Zhang Y, Lu J, Shai O (2018) Improve network embeddings with regularization. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 1643–1646 Zhao F, Zhang Y, Jianguo L, Shai O (2019) Measuring academic influence using heterogeneous author-citation networks. Scientometrics 118(3):1119–1140. https://doi.org/10.1007/s11192-019-03010-5 Zhou C, Liu Y, Liu X, Liu Z, Gao J (2017a) Scalable graph embedding for asymmetric proximity. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 2942–2948 Zhou C, Liu Y, Liu X, Liu Z, Gao J (2017b) Scalable graph embedding for asymmetric proximity. In: Thirty-first AAAI conference on artificial intelligence