Improved initialisation of model-based clustering using Gaussian hierarchical partitions

Luca Scrucca1, Adrian E. Raftery2
1Dipartimento di Economia, Università degli Studi di Perugia, Perugia, Italy
2Department of Statistics, University of Washington, Box 354322, Seattle, Washington, 98195-4322, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Auder B, Lebret R, Lovleff S, Langrognet F (2014) Rmixmod: an interface for MIXMOD. http://CRAN.R-project.org/package=Rmixmod , R package version 2.0.2

Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3):561–575

Biernacki C, Celeux G, Govaert G, Langrognet F (2006) Model-based cluster and discriminant analysis with the MIXMOD software. Comput Stat Data Anal 51:587–600

Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793

Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Series B Stat Methodol 39:1–38

Everitt B, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester, UK

Flury B (1997) A first course in multivariate statistics. Springer, New York

Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201

Fraley C (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM J Sci Compu 20(1):270–281

Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588

Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631

Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington

Fraley C, Raftery AE, Scrucca L (2015) mclust: normal mixture modelling for model-based clustering, classification, and density estimation. http://CRAN.R-project.org/package=mclust , R package version 5.0.1

Gordon AD (1999) Classification, 2nd edn. Chapman & Hall/CRC

Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc

Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, UK

Maitra R (2009) Initializing partition-optimization algorithms. IEEE/ACM Trans Comput Biol Bioinform 6(1):144–157

McLachlan G, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley-Interscience, Hoboken, New Jersey

McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

McLachlan GJ (1988) On the choice of starting values for the EM algorithm in fitting mixture models. Statistician 37(4/5):417

McNicholas PD, ElSherbiny A, McDaid AF, Murphy TB (2015) pgmm: Parsimonious Gaussian Mixture Models. http://CRAN.R-project.org/package=pgmm , R package version 1.2

Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116

Melnykov V, Melnykov I (2012) Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput Stat Data Anal 56(6):1381–1395

Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441–458

Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178

Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:31–38

Wu CJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11(1):95–103