Khung mô phỏng dự đoán dựa trên lấy mẫu cho mô hình khuếch tán ngẫu nhiên nhằm xác định các nút ảnh hưởng hàng đầu

Kouzou Ohara1, Kazumi Saito2,3, Masahiro Kimura4, Hiroshi Motoda5
1Department of Integrated Information Technology, Aoyama Gakuin University, Kanagawa, Japan
2Faculty of Science, Kanagawa University, Kanagawa, Japan
3Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
4Department of Electronics and Informatics, Ryukoku University, Otsu, Japan
5Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan

Tóm tắt

Chúng tôi giải quyết vấn đề ước lượng hiệu quả ảnh hưởng của một nút trong sự khuếch tán thông tin qua mạng xã hội. Bởi vì sự khuếch tán thông tin là một quá trình ngẫu nhiên, mức độ ảnh hưởng của một nút được định lượng thông qua kỳ vọng, mà thường đạt được qua nhiều lần mô phỏng tốn thời gian. Đóng góp của chúng tôi là đề xuất một khung cho mô phỏng dự đoán dựa trên kỹ thuật kiểm tra chéo leave-N-out, giúp ước lượng tốt sai số từ chân lý chưa biết cho hai vấn đề mục tiêu: một là ước lượng mức độ ảnh hưởng của mỗi nút, và vấn đề còn lại là xác định các nút ảnh hưởng top-K. Phương pháp mà chúng tôi đề xuất cho vấn đề đầu tiên ước lượng sai số xấp xỉ của mức độ ảnh hưởng của mỗi nút, và phương pháp cho vấn đề thứ hai ước lượng độ chính xác của các nút top-K thu được, cả hai đều không biết mức độ ảnh hưởng thực sự. Chúng tôi đánh giá thực nghiệm các phương pháp đề xuất bằng ba mạng xã hội thực tế và cho thấy rằng chúng có thể phục vụ như một tiêu chí tốt để giải quyết các vấn đề mục tiêu với số lần mô phỏng ít hơn nhiều, đảm bảo độ chính xác khi sử dụng kiểm tra chéo leave-half-out, tức là N là nửa số lần chạy, điều này có nghĩa là người ta có thể xác định các nút ảnh hưởng mà không cần biết chính xác mức độ ảnh hưởng của chúng với độ chính xác tốt.

Từ khóa

#khuếch tán thông tin; mạng xã hội; mức độ ảnh hưởng; mô phỏng dự đoán; kiểm tra chéo.

Tài liệu tham khảo

Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM’11), pp. 65–74 (2011) Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999) Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: Proceedings of the 25th Annual ACM–SIAM Symposium on Discrete Algorithms (SODA’14), pp. 946–957 (2014) Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 1029–1038 (2010) Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 199–208 (2009) Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 88–97 (2010) Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23(4), 493–507 (1952) Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55, 441–453 (1997) Cohen, E.: All-distances sketches, revisited: hip estimators for massive graphs analysis. In: Proceedings of the 33rd ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems, pp. 88–99 (2015) Cohen, E., Delling, D., Pajor, T., Werneck, R.F.: Sketch-based influence maximization and computation: Scaling up with guarantees. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 629–638 (2014) Cui, P., Wang, F., Yang, S., Sun, L.: Item-level social influence prediction with probabilistic hybrid factor matrix factorization. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011), pp. 331–336 (2011) Goyal, A., Lu, W., Lakshmanan, L.: Celf++: optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings of the 20th International World Wide Web Conference (WWW2011), pp. 47–48 (2011) Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st International Conference Companion on World Wide Web (WWW’12), pp. 1145–1152 (2012) Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963) Jin, R., Liu, L., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM 2012), pp. 459–468 (2012) Kassiano, V., Gounaris, A., Papadopoulos, A.N., Tsichlas, K.: Mining uncertain graphs: an overview. In: Proceedings of the International Workshop on Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2016), pp. 87–116 (2017) Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 137–146 (2003) Khan, A., Ye, Y., L, C.: On Uncertain Graphs. Morgan & Claypool, San Rafael (2018) Kimura, M., Saito, K.: Tractable models for information diffusion in social networks. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006), LNAI 4213, pp. 259–271 (2006) Kimura, M., Saito, K., Motoda, H.: Minimizing the spread of contamination by blocking links in a network. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-08), pp. 1175–1180 (2008) Kimura, M., Saito, K., Motoda, H.: Blocking links to minimize contamination spread in a social network. ACM Trans. Knowl. Discov. Data 3, 9:1–9:23 (2009) Kimura, M., Saito, K., Motoda, H.: Efficient estimation of influence functions fot sis model on social networks. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09) (2009) Kimura, M., Saito, K., Nakano, R.: Extracting influential nodes for information diffusion on a social network. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-07), pp. 1371–1376 (2007) Kimura, M., Saito, K., Nakano, R., Motoda, H.: Extracting influential nodes on a social network for information diffusion. Data Min. Knowl. Disc. 20, 70–97 (2010) Kimura, M., Saito, K., Ohara, K., Motoda, H.: Speeding-up node influence computation for huge social networks. Int. J. Data Sci. Anal. 1, 1–14 (2016) Kleinberg, J.: The convergence of social and technological networks. Commun. ACM 51(11), 66–72 (2008) Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of the 2004 European Conference on Machine Learning (ECML’04), pp. 217–226 (2004) Lawrence, T., Hosein, P.: Stochastic dynamic programming heuristics for influence maximization-revenue optimization. Int. J. Data Sci. Anal. (first Online) (2018) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 420–429 (2007) Loukides, G., Gwadera, R.: Preventing the diffusion of information to vulnerable users while preserving pagerank. Int. J. Data Sci. Anal. 5(1), 19–39 (2018) Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002) Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003) Nguyen, H., Zheng, R.: Influence spread in large-scale social networks—a belief propagation approach. In: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), LNAI 7524, pp. 515–530 (2012) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Predictive simulation framework of stochastic diffusion model for identifying top-k influential nodes. In: Proceedings of the 5th Asian Conference on Machine Learning (ACML2013), PMLR, vol. 29, pp. 149–164 (2013) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based framework for estimating node centrality of large social network. In: Proceedings of the 17th International Conference on Discovery Science (DS’14), LNAI 8777, pp. 228–239 (2014) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based gap analysis for detecting nodes with high centrality on large social network. In: Proceedings of the Nineteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2015), pp. 135–147 (2015) Ohara, K., Saito, K., Kimura, M., Motoda, H.: Critical node identification based on articulation point detection for network with uncertain connectivity. In: Proceedings of the Sixth International Symposium on Computing and Networking (CANDAR 2018), pp. 146–152 (2018) Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 997–1008 (2010) Rossetti, G., Milli, L., Salvator, R., Sirbu, A., Pedreschi, D., Giannotti, F.: Ndlib: a python library to model and analyze diffusion processes over complex networks. Int. J. Data Sci. Anal. 5(1), 61–79 (2018) Saito, K., Kimura, M., Motoda, H.: Discovering influential nodes for sis models in social networks. In: Proceedings of the Twelfth International Conference of Discovery Science (DS2009), LNAI 5808, pp. 302–316. Springer, Berlin (2009) Saito, K., Kimura, M., Ohara, K., Motoda, H.: Learning continuous-time information diffusion model for social behavioral data analysis. In: Proceedings of the 1st Asian Conference on Machine Learning (ACML2009), LNAI 5828, pp. 322–337 (2009) Saito, K., Kimura, M., Ohara, K., Motoda, H.: Selecting information diffusion models over social networks for behavioral analysis. In: Proceedings of the 2010 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2010), LNAI 6323, pp. 180–195 (2010) Saito, K., Kimura, M., ohara, K., Motoda, H.: Which targets to contact first to maximize influence over social network. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction (SBP2013), LNCS 7812, pp. 359–367 (2013) Vázquez, A.: Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. 67(5), 056,104 (2003) Watts, D.J., Dodds, P.S.: Influence, networks, and public opinion formation. J. Consum. Res. 34, 441–458 (2007) Watts, D.J., Strogatz, S.H.: Collective dyanamics of “small-world” networks. Nature 393, 440–442 (1998) Yang, J., Counts, S.: Predicting the speed, scale, and range of information diffusion in twitter. In: Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM 2010) (2010) Yang, J., Leskovec, J.: Modeling information diffusion in implicit networks. In: Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM’10), pp. 599–608 (2010) Yang, Y., Chen, E., Liu, Q., Xiang, B., Xu, T., Shad, S.: On approximation of real-world influence spread. In: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), LNAI 7524a, pp. 548–564 (2012) Zhou, C., Zhang, P., Zang, W., Guo, L.: On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Trans. Knowl. Data Eng. 27, 2770–2783 (2015)