kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning

Big Data Research - Tập 13 - Trang 38-51 - 2018
Hossein Estiri1,2,3, Behzad Abounia Omran4, Shawn N. Murphy1,2,3
1Harvard Medical School, United States of America
2Massachusetts General Hospital, United States of America
3Partners Healthcare, Boston, MA, United States of America
4Construction System Management, The Ohio State University, Columbus, OH, United States of America

Tài liệu tham khảo

Ghahramani, 2004, Unsupervised learning, 72 Hastie, 2009 Jain, 2010, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett., 31, 651, 10.1016/j.patrec.2009.09.011 Sugar, 2003, Finding the number of clusters in a dataset, J. Am. Stat. Assoc., 98, 750, 10.1198/016214503000000666 Hamerly, 2004, Learning the k in k means, 281 Fraley, 2002, Model-based clustering, discriminant analysis, and density estimation, J. Am. Stat. Assoc., 97, 611, 10.1198/016214502760047131 Caliński, 1974, A dendrite method for cluster analysis, Commun. Stat., 3, 1 Kaufman, 1987, Clustering by means of medoids, 405 Kaufman, 1990, Finding Groups in Data: An Introduction to Cluster Analysis, 10.1002/9780470316801 Tibshirani, 2001, Estimating the number of clusters in a data set via the gap statistic, J. R. Stat. Soc., Ser. B, Stat. Methodol., 63, 411, 10.1111/1467-9868.00293 Fraley, 1998, How many clusters? Which clustering method? Answers via model-based cluster analysis, Comput. J., 41, 578, 10.1093/comjnl/41.8.578 Frey, 2007, Clustering by passing messages between data points, Science, 315, 972, 10.1126/science.1136800 Pinto, 2015, Solar intensity characterization using data-mining to support solar forecasting, 193, 10.1007/978-3-319-19638-1_22 Rousseeuw, 1987, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20, 53, 10.1016/0377-0427(87)90125-7 Scrucca, 2016, mclust 5: clustering, classification and density estimation using gaussian finite mixture models, R J., 8, 289, 10.32614/RJ-2016-021 Oksanen, 2017 Hennig Bodenhofer, 2011, APCluster: an R package for affinity propagation clustering, Bioinformatics, 27, 2463, 10.1093/bioinformatics/btr406 Qiu Nalichowski, 2006, Calculating the benefits of a research patient data repository, AMIA Annual Symp. Proc., 1044 García, 2009, A study on the use of non-parametric tests for analyzing the evolutionary algorithms' behaviour: a case study on the CEC'2005 special session on real parameter optimization, J. Heuristics, 15, 617, 10.1007/s10732-008-9080-4 García, 2010, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci. (NY), 180, 2044, 10.1016/j.ins.2009.12.010 Santafe, 2015, Dealing with the evaluation of supervised classification algorithms, Artif. Intell. Rev., 44, 467, 10.1007/s10462-015-9433-y Calvo, 2015, scmamp: statistical comparison of multiple algorithms in multiple problems, R J., XX, 8 Wilcoxon, 1945, Individual comparisons by ranking methods, Biom. Bull., 1, 80, 10.2307/3001968 Holm, 1979, A simple sequential rejective multiple test procedure, Scand. J. Stat., 6, 65 Friedman, 1937, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J. Am. Stat. Assoc., 32, 675, 10.1080/01621459.1937.10503522 Bergmann, 1988, Improvements of general multiple test procedures for redundant systems of hypotheses, 100 Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1 Wolberg, 1992 Fisher, 1936, The use of multiple measurements in taxonomic problems, Ann. Eugen., 7, 179, 10.1111/j.1469-1809.1936.tb02137.x Becker Smith, 1988, Using the ADAP learning algorithm to forecast the onset of diabetes mellitus, 261 Fang, 2012, Selection of the number of clusters via the bootstrap method, Comput. Stat. Data Anal., 56, 468, 10.1016/j.csda.2011.09.003 Jain, 1987, Bootstrap technique in cluster analysis, Pattern Recognit., 20, 547, 10.1016/0031-3203(87)90081-1 Garcia, 2016, BoCluSt: bootstrap clustering stability algorithm for community detection, PLoS ONE, 11, 10.1371/journal.pone.0156576 Kerr, 2001, Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments, Proc. Natl. Acad. Sci., 98, 8961, 10.1073/pnas.161273698 Newell, 2013, An algorithm for deciding the number of clusters and validation using simulated data with application to exploring crop population structure, Ann. Appl. Stat., 7, 1898, 10.1214/13-AOAS671