Subsampling spectral clustering for stochastic block models in large-scale networks

Computational Statistics and Data Analysis - Tập 189 - Trang 107835 - 2024
Jiayi Deng1, Danyang Huang1, Yi Ding1, Yingqiu Zhu2, Bingyi Jing3, Bo Zhang1
1Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
2School of Statistics, University of International Business and Economics, China
3Department of Statistics and Data Science, Southern University of Science and Technology, China

Tài liệu tham khảo

Agarwal, 2005, Beyond pairwise clustering, 838 Alon, 2004 Amini, 2018, On semidefinite relaxations for the block model, Ann. Stat., 46, 149, 10.1214/17-AOS1545 Bhattacharyya, 2015, Subsampling bootstrap of count features of networks, Ann. Stat., 43, 2384, 10.1214/15-AOS1338 Binkiewicz, 2017, Covariate-assisted spectral clustering, Biometrika, 104, 361, 10.1093/biomet/asx008 Chen, 2006, Detecting functional modules in the yeast protein–protein interaction network, Bioinformatics, 22, 2283, 10.1093/bioinformatics/btl370 Chen, 2011, Large scale spectral clustering with landmark-based representation Chung, 2006 Deng Drineas, 2012, Fast approximation of matrix coherence and statistical leverage, J. Mach. Learn. Res., 13, 3475 Eden, 2017, Approximately counting triangles in sublinear time, SIAM J. Comput., 46, 1603, 10.1137/15M1054389 Feng, 2018, Faster matrix completion using randomized SVD, 608 Fortunato, 2010, Community detection in graphs, Phys. Rep., 486, 75, 10.1016/j.physrep.2009.11.002 Fowlkes, 2004, Spectral grouping using the Nyström method, IEEE Trans. Pattern Anal. Mach. Intell., 26, 214, 10.1109/TPAMI.2004.1262185 Gao, 2018, Community detection in degree-corrected block models, Ann. Stat., 46, 2153, 10.1214/17-AOS1615 Girvan, 2002, Community structure in social and biological networks, Proc. Natl. Acad. Sci., 99, 7821, 10.1073/pnas.122653799 Gonen, 2011, Counting stars and other small subgraphs in sublinear-time, SIAM J. Discrete Math., 25, 1365, 10.1137/100783066 Halko, 2011, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53, 217, 10.1137/090771806 Harenberg, 2014, Community detection in large-scale networks: a survey and empirical evaluation, Wiley Interdiscip. Rev.: Comput. Stat., 6, 426, 10.1002/wics.1319 Hartigan, 1979, Algorithm as 136: a k-means clustering algorithm, J. R. Stat. Soc., Ser. C, Appl. Stat., 28, 100 Holland, 1983, Stochastic blockmodels: first steps, Soc. Netw., 5, 109, 10.1016/0378-8733(83)90021-7 Hu, 2020, Corrected Bayesian information criterion for stochastic block models, J. Am. Stat. Assoc., 115, 1771, 10.1080/01621459.2019.1637744 Illenberger, 2012, Estimating network properties from snowball sampled data, Soc. Netw., 34, 701, 10.1016/j.socnet.2012.09.001 Karrer, 2011, Stochastic blockmodels and community structure in networks, Phys. Rev. E, 83, 10.1103/PhysRevE.83.016107 Knuth, 1976, Big omicron and big omega and big theta, ACM SIGACT News, 8, 18, 10.1145/1008328.1008329 Krzakala, 2013, Spectral redemption in clustering sparse networks, Proc. Natl. Acad. Sci., 110, 20935, 10.1073/pnas.1312486110 Lancichinetti, 2009, Community detection algorithms: a comparative analysis, Phys. Rev. E, 80, 10.1103/PhysRevE.80.056117 LeCun, 1995, Comparison of learning algorithms for handwritten digit recognition, 53 Lee, 2017, Time-dependent community structure in legislation cosponsorship networks in the congress of the Republic of Peru, J. Complex Netw., 5, 127 Lei, 2015, Consistency of spectral clustering in stochastic block models, Ann. Stat., 43, 215, 10.1214/14-AOS1274 Li, 2011, Time and space efficient spectral clustering via column sampling, 2297 Li, 2020, Network cross-validation by edge sampling, Biometrika, 107, 257, 10.1093/biomet/asaa006 Lunde Mackey, 2014, Matrix concentration inequalities via the method of exchangeable pairs, Ann. Probab., 42, 906, 10.1214/13-AOP892 Martin, 2018, Fast approximate spectral clustering for dynamic networks, 3423 Mukherjee, 2021, Two provably consistent divide-and-conquer clustering algorithms for large networks, Proc. Natl. Acad. Sci., 118, 10.1073/pnas.2100482118 Nepusz, 2012, Detecting overlapping protein complexes in protein-protein interaction networks, Nat. Methods, 9, 471, 10.1038/nmeth.1938 Newman, 2004, Finding and evaluating community structure in networks, Phys. Rev. E, 69, 10.1103/PhysRevE.69.026113 Ng, 2002, On spectral clustering: analysis and an algorithm, 849 Politis, 1999 Qin, 2013, Regularized spectral clustering under the degree-corrected stochastic blockmodel, Adv. Neural Inf. Process. Syst., 26 Rives, 2003, Modular organization of cellular networks, Proc. Natl. Acad. Sci., 100, 1128, 10.1073/pnas.0237338100 Rohe, 2011, Spectral clustering and the high-dimensional stochastic blockmodel, Ann. Stat., 39, 1878, 10.1214/11-AOS887 Snijders, 1999, Non-parametric standard errors and tests for network statistics, Connections, 22, 161 Tron, 2007, A benchmark for the comparison of 3-D motion segmentation algorithms, 1 Tropp, 2012, User-friendly tail bounds for sums of random matrices, Found. Comput. Math., 12, 389, 10.1007/s10208-011-9099-z Vitter, 1985, Random sampling with a reservoir, ACM Trans. Math. Softw., 11, 37, 10.1145/3147.3165 Wang, 2021, Optimal subsampling for quantile regression in big data, Biometrika, 108, 99, 10.1093/biomet/asaa043 Wang, 2018, Optimal subsampling for large sample logistic regression, J. Am. Stat. Assoc., 113, 829, 10.1080/01621459.2017.1292914 Wang, 2019, Information-based optimal subdata selection for big data linear regression, J. Am. Stat. Assoc., 114, 393, 10.1080/01621459.2017.1408468 Wang, 2011, Approximate pairwise clustering for large data sets via sampling plus extension, Pattern Recognit., 44, 222, 10.1016/j.patcog.2010.08.005 Wang, 2017, Likelihood-based model selection for stochastic block models, Ann. Stat., 45, 500, 10.1214/16-AOS1457 Yan, 2009, Fast approximate spectral clustering, 907 Yu, 2022, Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data, J. Am. Stat. Assoc., 117, 265, 10.1080/01621459.2020.1773832 Yu, 2015, A useful variant of the Davis–Kahan theorem for statisticians, Biometrika, 102, 315, 10.1093/biomet/asv008 Zhang, 2022, Randomized spectral clustering in large-scale stochastic block models, J. Comput. Graph. Stat., 31, 887, 10.1080/10618600.2022.2034636 Zhao, 2011, Community extraction for social networks, Proc. Natl. Acad. Sci., 108, 7321, 10.1073/pnas.1006642108 Zhao, 2012, Consistency of community detection in networks under degree-corrected stochastic block models, Ann. Stat., 40, 2266, 10.1214/12-AOS1036