Time-varying graph representation learning via higher-order skip-gram with negative sampling
Tóm tắt
Representation learning models for graphs are a successful family of techniques that project nodes into feature spaces that can be exploited by other machine learning algorithms. Since many real-world networks are inherently dynamic, with interactions among nodes changing over time, these techniques can be defined both for static and for time-varying graphs. Here, we show how the skip-gram embedding approach can be generalized to perform implicit tensor factorization on different tensor representations of time-varying graphs. We show that higher-order skip-gram with negative sampling (HOSGNS) is able to disentangle the role of nodes and time, with a small fraction of the number of parameters needed by other approaches. We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned representations outperform state-of-the-art methods when used to solve downstream tasks such as network reconstruction. Good performance on predicting the outcome of dynamical processes such as disease spreading shows the potential of this method to estimate contagion risk, providing early risk awareness based on contact tracing data.
Tài liệu tham khảo
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47
Cai H, Zheng VW, Chang KC-C (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637
Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl-Based Syst 151:78–94
Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125
Casteigts A, Flocchini P, Quattrociocchi W, Santoro N (2012) Time-varying graphs and dynamic networks. Int J Parallel Emerg Distrib Syst 27(5):387–408
Barrat A, Barthelemy M, Vespignani A (2008) Dynamical processes on complex networks. Cambridge University Press, Cambridge
Sato K, Oka M, Barrat A, Cattuto C (2021) Predicting partially observed processes on temporal networks by dynamics-aware node embeddings (DyANE). EPJ Data Sci 10(1):22
Alsdurf H, Bengio Y, Deleu T, Gupta P, Ippolito D, Janda R, Jarvie M, Kolody T, Krastev S, Maharaj T et al (2020). Covi white paper. arXiv preprint. arXiv:2005.08502
Kapoor A, Ben X, Liu L, Perozzi B, Barnes M, Blais M, O’Banion S (2020) Examining COVID-19 forecasting using spatio-temporal gnns. In: Proceedings of the 16th international workshop on mining and learning with graphs (MLG)
Gao J, Sharma R, Qian C, Glass LM, Spaeder J, Romberg J, Sun J, Xiao C (2021) Stan: spatio-temporal attention network for pandemic prediction using real-world evidence. J Am Med Inform Assoc 28(4):733–743
Sapienza A, Panisson A, Wu J, Gauvin L, Cattuto C (2015) Detecting anomalies in time-varying networks using tensor decomposition. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 516–523
Gauvin L, Panisson A, Cattuto C (2014) Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PLoS ONE 9(1):86028
Génois M, Vestergaard CL, Fournet J, Panisson A, Bonmarin I, Barrat A (2015) Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers. Netw Sci 3(3):326–347
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems, pp 2177–2185
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proc. of the 20th ACM SIGKDD int. conf. on knowledge discovery and data mining. ACM, New York, pp 701–710
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832
Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29
Yang Z, Ding M, Zhou C, Yang H, Zhou J, Tang J (2020) Understanding negative sampling in graph representation learning. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1666–1676
Assylbekov Z, Takhanov R (2019) Context vectors are reflections of word vectors in half the dimensions. J Artif Intell Res 66:225–242
Allen C, Balazevic I, Hospedales T (2019) What the vec? Towards probabilistically grounded embeddings. In: Advances in neural information processing systems, pp 7465–7475
Melamud O, Goldberger J (2017) Information-theory interpretation of the skip-gram negative-sampling objective function. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers), pp 167–171
Arora S, Li Y, Liang Y, Ma T, Risteski A (2016) A latent variable model approach to pmi-based word embeddings. Trans Assoc Comput Linguist 4:385–399
Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: Proceedings of the eleventh ACM international conference on web search and data mining. ACM, New York, pp 459–467
Hamilton WL, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. arXiv preprint. arXiv:1709.05584
Dunlavy DM, Kolda TG, Acar E (2011) Temporal link prediction using matrix and tensor factorizations. ACM Trans Knowl Discov Data 5(2):1–27
De Domenico M, Solé-Ribalta A, Cozzo E, Kivelä M, Moreno Y, Porter MA, Gómez S, Arenas A (2013) Mathematical formulation of multilayer networks. Phys Rev X 3(4):041022
Taylor D, Porter MA, Mucha PJ (2019) In: Holme P, Saramäki J (eds) Supracentrality analysis of temporal networks with directed interlayer coupling. Springer, Cham, pp 325–344
Valdano E, Ferreri L, Poletto C, Colizza V (2015) Analytical computation of the epidemic threshold on temporal networks. Phys Rev X 5(2):021005
Kazemi SM, Goel R, Jain K, Kobyzev I, Sethi A, Forsyth P, Poupart P (2020) Representation learning for dynamic graphs: a survey. J Mach Learn Res 21(70):1–73
Barros CDT, Mendonça MRF, Vieira AB, Ziviani A (2021) A survey on embedding dynamic graphs. ACM Comput Surv 55(1):10
Goyal P, Kamra N, He X, Liu Y (2017) Dyngem: deep embedding method for dynamic graphs. In: IJCAI workshop on representation learning for graphs (ReLiG)
Zhang Z, Cui P, Pei J, Wang X, Zhu W (2018) TIMERS: error-bounded SVD restart on dynamic networks. In: Thirty-second AAAI conference on artificial intelligence
Du L, Wang Y, Song G, Lu Z, Wang J (2018) Dynamic network embedding: an extended approach for skip-gram based network embedding. In: IJCAI, pp 2086–2092
Peng H, Li J, Yan H, Gong Q, Wang S, Liu L, Wang L, Ren X (2020) Dynamic network embedding via incremental skip-gram with negative sampling. Sci China Inf Sci 63(10):1–19
Béres F, Kelen DM, Pálovics R, Benczúr AA (2019) Node embeddings in dynamic graphs. Appl Netw Sci 4(1):64
Mahdavi S, Khoshraftar S, An A (2018) dynnode2vec: scalable dynamic network embedding. In: 2018 IEEE international conference on big data (big data). IEEE, pp 3762–3765
Yu W, Cheng W, Aggarwal CC, Zhang K, Chen H, Wang W (2018) Netwalk: a flexible deep embedding approach for anomaly detection in dynamic networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2672–2681
Goyal P, Chhetri SR, Canedo A (2020) dyngraph2vec: capturing network dynamics using dynamic graph representation learning. Knowl-Based Syst 187:104816
Li T, Zhang J, Philip SY, Zhang Y, Yan Y (2018) Deep dynamic network embedding for link prediction. IEEE Access 6:29219–29230
Sankar A, Wu Y, Gou L, Zhang W, Yang H (2020) Dysat: deep neural representation learning on dynamic graphs via self-attention networks. In: Proceedings of the 13th international conference on web search and data mining, pp 519–527
Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2020) Inductive representation learning on temporal graphs. In: International conference on learning representations
Zhou L, Yang Y, Ren X, Wu F, Zhuang Y (2018) Dynamic network embedding by modeling triadic closure process. In: Thirty-second AAAI conference on artificial intelligence
Zhu L, Guo D, Yin J, Ver Steeg G, Galstyan A (2016) Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Trans Knowl Data Eng 28(10):2765–2777
Torricelli M, Karsai M, Gauvin L (2020) weg2vec: event embedding for temporal networks. Sci Rep 10(1):1–11
Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Continuous-time dynamic network embeddings. In: Companion proceedings of the web conference 2018, pp 969–976
Zhan X-X, Li Z, Masuda N, Holme P, Wang H (2020) Susceptible-infected-spreading-based network embedding in static and temporal networks. EPJ Data Sci 9(1):30
Kumar S, Zhang X, Leskovec J (2019) Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1269–1278
Malik OA, Ubaru S, Horesh L, Kilmer ME, Avron H (2021) Dynamic graph convolutional networks using the tensor m-product. In: Proceedings of the 2021 SIAM international conference on data mining (SDM). SIAM, Philadelphia, pp 729–737
Rudolph M, Blei D (2018) Dynamic embeddings for language evolution. In: Proceedings of the 2018 world wide web conference, pp 1003–1011
Liu P, Qiu X, Huang X (2015) Learning context-sensitive word embeddings with neural tensor skip-gram model. In: Twenty-fourth international joint conference on artificial intelligence
Cotterell R, Poliak A, Van Durme B, Eisner J (2017) Explaining and generalizing skip-gram through exponential family principal component analysis. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers, pp 175–181
Xiong L, Chen X, Huang T-K, Schneider J, Carbonell JG (2010) Temporal collaborative filtering with Bayesian probabilistic tensor factorization. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, Philadelphia, pp 211–222
Wu X, Shi B, Dong Y, Huang C, Chawla NV (2019) Neural tensor factorization for temporal interaction learning. In: Proc. of the twelfth ACM int. conf. on web search and data mining, pp 537–545
Lacroix T, Obozinski G, Usunier N (2020) Tensor decompositions for temporal knowledge base completion. In: International conference on learning representations
Ma Y, Tresp V, Daxberger EA (2019) Embedding models for episodic knowledge graphs. J Web Semant 59:100490
Chanpuriya S, Musco C, Sotiropoulos K, Tsourakakis C (2020) Node embeddings and exact low-rank representations of complex networks. In: Advances in neural information processing systems, vol 33
Chanpuriya S, Musco C, Sotiropoulos K, Tsourakakis C (2021) Deepwalking backwards: from embeddings back to graphs. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, vol 139, pp 1473–1483
Cattuto C, Van den Broeck W, Barrat A, Colizza V, Pinton J-F, Vespignani A (2010) Dynamics of person-to-person interactions from distributed rfid sensor networks. PLoS ONE 5(7):e11596
Génois M, Barrat A (2018) Can co-location be used as a proxy for face-to-face contacts? EPJ Data Sci 7(1):11
Hinch R, Probert WJ, Nurtay A, Kendall M, Wymant C, Hall M, Lythgoe K, Bulas Cruz A, Zhao L, Stewart A et al. (2021) Openabm-COVID19—an agent-based model for non-pharmaceutical interventions against COVID-19 including contact tracing. PLoS Comput Biol 17(7):1009146
Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: versatile graph embeddings from similarity measures. In: Proceedings of the 2018 world wide web conference, pp 539–548
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint. arXiv:1802.03426
Stehlé J, Voirin N, Barrat A, Cattuto C, Isella L, Pinton J-F, Quaggiotto M, Van den Broeck W, Régis C, Lina B et al. (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6(8):e23176
Barrat A, Cattuto C, Colizza V, Gesualdo F, Isella L, Pandolfi E, Pinton J-F, Ravà L, Rizzo C, Romano M, Stehlé J, Tozzi AE, Van den Broeck W (2013) Empirical temporal networks of face-to-face human interactions. Eur Phys J Spec Top 222(6):1295–1309
Starnini M, Baronchelli A, Barrat A, Pastor-Satorras R (2012) Random walks on temporal networks. Phys Rev E 85(5):056115
Panisson A, Gauvin L, Barrat A, Cattuto C (2013) Fingerprinting temporal networks of close-range human proximity. In: 2013 IEEE international conference on pervasive computing and communications workshops (PERCOM workshops). IEEE, pp 261–266
Sapienza A, Barrat A, Cattuto C, Gauvin L (2018) Estimating the outcome of spreading processes on networks with incomplete information: a dimensionality reduction approach. Phys Rev E 98(1):012317
Galimberti E, Barrat A, Bonchi F, Cattuto C, Gullo F (2018) Mining (maximal) span-cores from temporal networks. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 107–116