Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-K influential nodes

Kouzou Ohara1, Kazumi Saito2,3, Masahiro Kimura4, Hiroshi Motoda5
1Department of Integrated Information Technology, Aoyama Gakuin University, Kanagawa, Japan
2Faculty of Science, Kanagawa University, Kanagawa, Japan
3Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
4Department of Electronics and Informatics, Ryukoku University, Otsu, Japan
5Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan

Tóm tắt

We address a problem of efficiently estimating the influence of a node in information diffusion over a social network. Since the information diffusion is a stochastic process, the influence degree of a node is quantified by the expectation, which is usually obtained by very time-consuming many runs of simulation. Our contribution is that we proposed a framework for predictive simulation based on the leave-N-out cross-validation technique that well approximates the error from the unknown ground truth for two target problems: one to estimate the influence degree of each node, and the other to identify top-K influential nodes. The method we proposed for the first problem estimates the approximation error of the influence degree of each node, and the method for the second problem estimates the precision of the derived top-K nodes, both without knowing the true influence degree. We experimentally evaluate the proposed methods using the three real-world networks and show that they can serve as a good measure to solve the target problems with far fewer runs of simulation ensuring the accuracy when the leave-half-out cross-validation, i.e., N is the half of the number of runs, is used, which means that one can identify the influential nodes without knowing exactly their influence degree in good accuracy.

Tài liệu tham khảo

Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM’11), pp. 65–74 (2011) Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999) Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: Proceedings of the 25th Annual ACM–SIAM Symposium on Discrete Algorithms (SODA’14), pp. 946–957 (2014) Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 1029–1038 (2010) Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 199–208 (2009) Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 88–97 (2010) Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23(4), 493–507 (1952) Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55, 441–453 (1997) Cohen, E.: All-distances sketches, revisited: hip estimators for massive graphs analysis. In: Proceedings of the 33rd ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems, pp. 88–99 (2015) Cohen, E., Delling, D., Pajor, T., Werneck, R.F.: Sketch-based influence maximization and computation: Scaling up with guarantees. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 629–638 (2014) Cui, P., Wang, F., Yang, S., Sun, L.: Item-level social influence prediction with probabilistic hybrid factor matrix factorization. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011), pp. 331–336 (2011) Goyal, A., Lu, W., Lakshmanan, L.: Celf++: optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings of the 20th International World Wide Web Conference (WWW2011), pp. 47–48 (2011) Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st International Conference Companion on World Wide Web (WWW’12), pp. 1145–1152 (2012) Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963) Jin, R., Liu, L., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM 2012), pp. 459–468 (2012) Kassiano, V., Gounaris, A., Papadopoulos, A.N., Tsichlas, K.: Mining uncertain graphs: an overview. In: Proceedings of the International Workshop on Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2016), pp. 87–116 (2017) Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 137–146 (2003) Khan, A., Ye, Y., L, C.: On Uncertain Graphs. Morgan & Claypool, San Rafael (2018) Kimura, M., Saito, K.: Tractable models for information diffusion in social networks. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006), LNAI 4213, pp. 259–271 (2006) Kimura, M., Saito, K., Motoda, H.: Minimizing the spread of contamination by blocking links in a network. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-08), pp. 1175–1180 (2008) Kimura, M., Saito, K., Motoda, H.: Blocking links to minimize contamination spread in a social network. ACM Trans. Knowl. Discov. Data 3, 9:1–9:23 (2009) Kimura, M., Saito, K., Motoda, H.: Efficient estimation of influence functions fot sis model on social networks. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09) (2009) Kimura, M., Saito, K., Nakano, R.: Extracting influential nodes for information diffusion on a social network. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-07), pp. 1371–1376 (2007) Kimura, M., Saito, K., Nakano, R., Motoda, H.: Extracting influential nodes on a social network for information diffusion. Data Min. Knowl. Disc. 20, 70–97 (2010) Kimura, M., Saito, K., Ohara, K., Motoda, H.: Speeding-up node influence computation for huge social networks. Int. J. Data Sci. Anal. 1, 1–14 (2016) Kleinberg, J.: The convergence of social and technological networks. Commun. ACM 51(11), 66–72 (2008) Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of the 2004 European Conference on Machine Learning (ECML’04), pp. 217–226 (2004) Lawrence, T., Hosein, P.: Stochastic dynamic programming heuristics for influence maximization-revenue optimization. Int. J. Data Sci. Anal. (first Online) (2018) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 420–429 (2007) Loukides, G., Gwadera, R.: Preventing the diffusion of information to vulnerable users while preserving pagerank. Int. J. Data Sci. Anal. 5(1), 19–39 (2018) Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002) Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003) Nguyen, H., Zheng, R.: Influence spread in large-scale social networks—a belief propagation approach. In: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), LNAI 7524, pp. 515–530 (2012) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Predictive simulation framework of stochastic diffusion model for identifying top-k influential nodes. In: Proceedings of the 5th Asian Conference on Machine Learning (ACML2013), PMLR, vol. 29, pp. 149–164 (2013) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based framework for estimating node centrality of large social network. In: Proceedings of the 17th International Conference on Discovery Science (DS’14), LNAI 8777, pp. 228–239 (2014) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based gap analysis for detecting nodes with high centrality on large social network. In: Proceedings of the Nineteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2015), pp. 135–147 (2015) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Critical node identification based on articulation point detection for network with uncertain connectivity. In: Proceedings of the Sixth International Symposium on Computing and Networking (CANDAR 2018), pp. 146–152 (2018) Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 997–1008 (2010) Rossetti, G., Milli, L., Salvator, R., Sirbu, A., Pedreschi, D., Giannotti, F.: Ndlib: a python library to model and analyze diffusion processes over complex networks. Int. J. Data Sci. Anal. 5(1), 61–79 (2018) Saito, K., Kimura, M., Motoda, H.: Discovering influential nodes for sis models in social networks. In: Proceedings of the Twelfth International Conference of Discovery Science (DS2009), LNAI 5808, pp. 302–316. Springer, Berlin (2009) Saito, K., Kimura, M., Ohara, K., Motoda, H.: Learning continuous-time information diffusion model for social behavioral data analysis. In: Proceedings of the 1st Asian Conference on Machine Learning (ACML2009), LNAI 5828, pp. 322–337 (2009) Saito, K., Kimura, M., Ohara, K., Motoda, H.: Selecting information diffusion models over social networks for behavioral analysis. In: Proceedings of the 2010 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2010), LNAI 6323, pp. 180–195 (2010) Saito, K., Kimura, M., ohara, K., Motoda, H.: Which targets to contact first to maximize influence over social network. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction (SBP2013), LNCS 7812, pp. 359–367 (2013) Vázquez, A.: Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. 67(5), 056,104 (2003) Watts, D.J., Dodds, P.S.: Influence, networks, and public opinion formation. J. Consum. Res. 34, 441–458 (2007) Watts, D.J., Strogatz, S.H.: Collective dyanamics of “small-world” networks. Nature 393, 440–442 (1998) Yang, J., Counts, S.: Predicting the speed, scale, and range of information diffusion in twitter. In: Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM 2010) (2010) Yang, J., Leskovec, J.: Modeling information diffusion in implicit networks. In: Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM’10), pp. 599–608 (2010) Yang, Y., Chen, E., Liu, Q., Xiang, B., Xu, T., Shad, S.: On approximation of real-world influence spread. In: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), LNAI 7524a, pp. 548–564 (2012) Zhou, C., Zhang, P., Zang, W., Guo, L.: On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Trans. Knowl. Data Eng. 27, 2770–2783 (2015)