An extensive comparative study of cluster validity indices

Pattern Recognition - Tập 46 Số 1 - Trang 243-256 - 2013
Olatz Arbelaitz1, Ibai Gurrutxaga1, Javier Muguerza1, Jesús M. Pérez1, Iñigo Perona1
1Department of Computer Architecture and Technology, University of the Basque Country UPV/EHU, Manuel Lardizabal 1, 20018 Donostia, Spain

Tóm tắt

Từ khóa


Tài liệu tham khảo

Halkidi, 2001, On clustering validation techniques, Journal of Intelligent Information Systems, 17, 107, 10.1023/A:1012801612483

Jain, 1988

Mirkin, 2005

Sneath, 1973

Holzinger, 1941

Chou, 2004, A new cluster validity measure and its application to image compression, Pattern Analysis and Applications, 7, 205, 10.1007/s10044-004-0218-1

2002

Pal, 1997, Cluster validation using graph theoretic concepts, Pattern Recognition, 30, 847, 10.1016/S0031-3203(96)00127-6

I. Guyon, U. von Luxburg, R.C. Williamson, Clustering: science or art?, in: NIPS 2009 Workshop on Clustering Theory, Vancouver, Canada, 2009.

Brun, 2007, Model-based evaluation of clustering validation measures, Pattern Recognition, 40, 807, 10.1016/j.patcog.2006.06.026

Pfitzner, 2009, Characterization and evaluation of similarity measures for pairs of clusterings, Knowledge and Information Systems, 19, 361, 10.1007/s10115-008-0150-6

Batagelj, 1995, Comparing resemblance measures, Journal of Classification, 12, 73, 10.1007/BF01202268

Dunn, 1973, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics, 3, 32, 10.1080/01969727308546046

Davies, 1979, A clustering separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 224, 10.1109/TPAMI.1979.4766909

Calinski, 1974, A dendrite method for cluster analysis, Communications in Statistics, 3, 1, 10.1080/03610927408827101

A. Ben-Hur, A. Elisseeff, I. Guyon, A stability based method for discovering structure in clustered data, in: Biocomputing 2002 Proceedings of the Pacific Symposium, vol. 7, 2002, pp. 6–17.

Jain, 1987, Bootstrap technique in cluster analysis, Pattern Recognition, 20, 547, 10.1016/0031-3203(87)90081-1

Dimitriadou, 2002, An examination of indexes for determining the number of clusters in binary data sets, Psychometrika, 67, 137, 10.1007/BF02294713

Maulik, 2002, Performance evaluation of some clustering algorithms and validity indices, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 1650, 10.1109/TPAMI.2002.1114856

Milligan, 1985, An examination of procedures for determining the number of clusters in a data set, Psychometrika, 50, 159, 10.1007/BF02294245

Halkidi, 2008, A density-based cluster validity approach using multi-representatives, Pattern Recognition Letters, 20, 773, 10.1016/j.patrec.2007.12.011

Hardy, 1996, On the number of clusters, Computational Statistics & Data Analysis, 23, 83, 10.1016/S0167-9473(96)00022-9

Lago-Fernández, 2010, Normality-based validation for crisp clustering, Pattern Recognition, 43, 782, 10.1016/j.patcog.2009.09.018

Žalik, 2011, Validity index for clusters of different sizes and densities, Pattern Recognition Letters, 32, 221, 10.1016/j.patrec.2010.08.007

Kim, 2005, New indices for cluster validity assessment, Pattern Recognition Letters, 26, 2353, 10.1016/j.patrec.2005.04.007

Saha, 2009, Performance evaluation of some symmetry-based cluster validity indexes, IEEE Transactions on Systems, Man, and Cybernetics, Part C, 39, 420, 10.1109/TSMCC.2009.2013335

Dubes, 1987, How many clusters are best? – an experiment, Pattern Recognition, 20, 645, 10.1016/0031-3203(87)90034-3

Gurrutxaga, 2011, Towards a standard methodology to evaluate internal cluster validity indices, Pattern Recognition Letters, 32, 505, 10.1016/j.patrec.2010.11.006

Bezdek, 1997, A geometric approach to cluster validity for normal mixtures, Soft Computing—A Fusion of Foundations, Methodologies and Applications, 1, 166

Bandyopadhyay, 2008, A point symmetry-based clustering technique for automatic evolution of clusters, IEEE Transactions on Knowledge and Data Engineering, 20, 1441, 10.1109/TKDE.2008.79

Hubert, 1985, Comparing partitions, Journal of Classification, 2, 193, 10.1007/BF01908075

Kim, 2001, A novel validity index for determination of the optimal number of clusters, IEICE Transactions on Information and Systems, E84-D, 281

Sugar, 2003, Finding the number of clusters in a dataset, Journal of the American Statistical Association, 98, 750, 10.1198/016214503000000666

Baker, 1975, Measuring the power of hierarchical cluster analysis, Journal of the American Statistical Association, 70, 31, 10.1080/01621459.1975.10480256

Hubert, 1976, A general statistical framework for assessing categorical clustering in free recall, Psychological Bulletin, 83, 1072, 10.1037/0033-2909.83.6.1072

Rousseeuw, 1987, Silhouettes, Journal of Computational and Applied Mathematics, 20, 53, 10.1016/0377-0427(87)90125-7

Bezdek, 1998, Some new indexes of cluster validity, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 28, 301, 10.1109/3477.678624

M. Halkidi, M. Vazirgiannis, Clustering validity assessment: finding the optimal partitioning of a data set, in: Proceedings of the First IEEE International Conference on Data Mining (ICDM'01), California, USA, 2001, pp. 187–194.

Saitta, 2007, A bounded index for cluster validity, vol. 4571, 174

Gurrutxaga, 2010, SEP/COP, Pattern Recognition, 43, 3364, 10.1016/j.patcog.2010.04.021

Jaccard, 1908, Nouvelles recherches sur la distribution florale, Bulletin de la Societé Vaudoise de Sciences Naturelles, 44, 223

M. Meilă, Comparing clusterings by the variation of information, in: Proceedings of the Sixteenth Annual Conference on Computational Learning Theory (COLT), 2003, pp. 173–187.

A. Frank, A. Asuncion, UCI machine learning repository, 2010.

Dems˘ar, 2006, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1

Dietterich, 1998, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10, 1895, 10.1162/089976698300017197

García, 2008, An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons, Journal of Machine Learning Research, 9, 2677