Improving accuracy of expected frequency of uncertain roles based on efficient ensembling

Applied Network Science - Tập 7 - Trang 1-21 - 2022
Soshi Naito1, Takayasu Fushimi1
1School of Computer Science, Tokyo University of Technology, Hachioji-city, Japan

Tóm tắt

This study tackles the problem of extracting the node roles in uncertain graphs based on network motifs. Uncertain graphs are useful for modeling information diffusion phenomena because the presence or absence of edges is stochastically determined. In such an uncertain graph, the node role also changes stochastically according to the presence or absence of edges, so approximate calculation using a huge number of samplings is common. However, the calculation load is very large, even for a small graph. We propose a method to extract uncertain node roles with high accuracy and high speed by ensembling a large number of sampled graphs and efficiently searching for all other transitionable roles. This method provides highly accurate results compared to simple sampling and ensembling methods that do not consider the transition to other roles. In our evaluation experiment, we use real-world graphs artificially assigned uniform and non-uniform edge existence probabilities. The results show that the proposed method outperforms an existing method previously reported by the authors, which is the basis of the proposed method, as well as another current method based on the state-of-the-art algorithm, in terms of efficiency and accuracy.

Tài liệu tham khảo

Ahmed NK, Neville J, Rossi RA, Duffield N (2015) Efficient graphlet counting for large networks. In: 2015 IEEE international conference on data mining, pp 1–10. https://doi.org/10.1109/ICDM.2015.141 Ceccarello M, Fantozzi C, Pietracaprina A, Pucci G, Vandin F (2017) Clustering uncertain graphs. Proc VLDB Endow 11(4):472–484 Everett M, Borgatti S (1994) Regular equivalence: general theory. J Math Sociol 19(1):29–52 Gilpin S, Eliassi-Rad T, Davidson I (2013) Guided learning for role discovery (glrd): Framework, algorithms, and applications. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 113–121 Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the 11th annual international conference on research in computational molecular biology, RECOMB’07. Springer, Berlin, pp 92–106 Guerrero C, Milenković T, Pržulj N, Kaiser P, Huang L (2008) Characterization of the proteasome interaction network using a QTAX-based tag-team strategy and protein interaction network analysis. Proc Natl Acad Sci 105(36):13333–13338. https://doi.org/10.1073/pnas.0801870105 Henderson K, Gallagher B, Eliassi-Rad T, Tong H, Basu S, Akoglu L, Koutra D, Faloutsos C, Li L (2012) Rolx: structural role extraction & mining in large graphs. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1231–1239 Henderson K, Gallagher B, Li L, Akoglu L, Eliassi-Rad T, Tong H, Faloutsos C (2011) It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 663–671 Hu J, Cheng R, Huang Z, Fang Y, Luo S (2017) On embedding uncertain graphs. In: ACM on conference on information and knowledge management, vol 123, pp 157–166 https://github.com/fuppo27/graph_dataset Itzhack R, Mogilevski Y, Louzoun Y (2007) An optimal algorithm for counting network motifs. Phys A Stat Mech Appl 381:482–490. https://doi.org/10.1016/j.physa.2007.02.102 Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Springer, Berlin, pp 217–226 Kvålseth TO (2017) On normalized mutual information: measure derivations and properties. Entropy. https://doi.org/10.3390/e19110631 Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data Lorrain F, White H (1971) Structural equivalence of individuals in social networks. J Math Sociol 1(1):49–80 Ma C, Cheng R, Lakshmanan LVS, Grubenmann T, Fang Y, Li X (2019) Linc: a motif counting algorithm for uncertain graphs. Proc VLDB Endow 13(2):155–168. https://doi.org/10.14778/3364324.3364330 Marinka Zitnik Rok Sosič SM, Leskovec J (2018) BioSNAP datasets: Stanford biomedical network dataset collection. http://snap.stanford.edu/biodata McDonnell MD, Yaveroglu ON, Schmerl BA, Iannella N, Ward LM (2014) Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks. PLoS ONE 9(12):1–25. https://doi.org/10.1371/journal.pone.0114503 Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827. https://doi.org/10.1126/science.298.5594.824 Naito S, Fushimi T (2021) Motif-role extraction in uncertain graph based on efficient ensembles. In: Proceedings of the 10th international conference on complex networks and their applications, pp 501–513 Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions. Math Program 14:265–294 Ohnishi T, Takayasu H, Takayasu M (2010) Network motifs in an inter-firm network. J Econ Interact Coord 5(2):171–180 Pfeiffer JJ, Neville J (2011) Methods to determine node centrality and clustering in graphs with uncertain structure. In: Proceedings of the fifth international conference on weblogs and social media. The AAAI Press, pp 590–593 Pinar A, Seshadhri C, Vishal V (2017) Escape: efficiently counting all 5-vertex subgraphs. In: Proceedings of the 26th international conference on World Wide Web, WWW ’17. Republic and Canton of Geneva, CHE, pp 1431–1440. https://doi.org/10.1145/3038912.3052597 Pržulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):177–183. https://doi.org/10.1093/bioinformatics/btl301 Rossi RA, Gallagher B, Neville J, Henderson K (2012) Role-dynamics: fast mining of large dynamic networks. In: Proceedings of the 21st international conference companion on World Wide Webv, pp 997–1006 Rossi RA, Gallagher B, Neville J, Henderson K (2013) Modeling dynamic behavior in large evolving graphs. In: Proceedings of the sixth ACM international conference on web search and data mining. ACM, pp 667–676 Rossi RA, Ahmed NK (2015) Role discovery in networks. IEEE Trans Knowl Data Eng 27(4):1112–1131 Sarajlić A, Malod-Dognin N, Yaveroǧlu ON, Pržulj N (2016) Graphlet-based characterization of directed networks. Nature 123:89. https://doi.org/10.1038/srep35098 Todor A, Dobra A, Kahveci T (2015) Counting motifs in probabilistic biological networks. In: Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics, BCB ’15. Association for Computing Machinery, New York, pp 116–125. https://doi.org/10.1145/2808719.2808731 Tran N, Choi KP, Zhang L (2013) Counting motifs in the human interactome. Nat Commun 4:2241. https://doi.org/10.1038/ncomms3241 Wernicke S (2005) A faster algorithm for detecting network motifs. In: Proceedings of the 5th international conference on algorithms in bioinformatics, WABI’05. Springer, Berlin, pp 165–177