Mixture model averaging for clustering

Advances in Data Analysis and Classification - Tập 9 - Trang 197-217 - 2014
Yuhong Wei1, Paul D. McNicholas2
1Department of Mathematics and Statistics, University of Guelph, Guelph, Canada
2Department of Mathematics and Statistics, McMaster University, Hamilton, Canada

Tóm tắt

In mixture model-based clustering applications, it is common to fit several models from a family and report clustering results from only the ‘best’ one. In such circumstances, selection of this best model is achieved using a model selection criterion, most often the Bayesian information criterion. Rather than throw away all but the best model, we average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results. Two (weighted) averaging approaches are considered: averaging component membership probabilities and averaging models. In both cases, Occam’s window is used to determine closeness to the best model and weights are computed within a Bayesian model averaging paradigm. In some cases, we need to merge components before averaging; we introduce a method for merging mixture components based on the adjusted Rand index. The effectiveness of our model-based clustering averaging approaches is illustrated using a family of Gaussian mixture models on real and simulated data.

Tài liệu tham khảo

Anderson E (1935) The irises of the Gaspé peninsula. Bull Am Iris Soc 59:2–5 Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373 Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55(1):520–529 Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821 Basford KE, McLachlan GJ (1985) Estimation of allocation rates in a cluster analysis context. J Am Stat Assoc 80(390):286–293 Baudry J-P, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332–353 Bhattacharya S, McNicholas PD (2014) A LASSO-penalized BIC for mixture model selection. Adv Data Anal Classif 8(1):45–61 Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725 Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519 Browne RP, McNicholas PD (2013) Mixture: mixture models for clustering and classification. R package version 1.0 Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8(2):217–226 Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793 Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do K-A, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, New York Dasgupta A, Raftery AE (1998) Detecting features in spatial point processes with clutter via model-based clustering. J Am Stat Assoc 93:294–302 Dean N, Murphy TB, Downey G (2006) Using unlabelled data to update classification rules with applications in food authenticity studies. J R Stat Soc: Ser C 55(1):1–14 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B 39(1):1–38 Faraway J (2011) Faraway: functions and datasets for books by Julian Faraway. R package version 1.0.5 Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188 Flury B (1997) A first course in multivariate statistics. Springer, New York Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3 Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201 Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington, Seattle, WA Fraley C, Raftery AE, Scrucca L (2013) mclust: normal mixture modeling for model-based clustering, classification, and density estimation. R package version 4.2 Franczak BC, Browne RP, McNicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149–1157 Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27:835–850 Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc: Ser B 58:155–176 Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4:3–34 Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879–899 Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: A tutorial. Stat Sci 14(4):382–401 Hoeting JA, Raftery AE, Madigan D (1999) Bayesian simultaneous variable and transformation selection in linear regression. Technical Report 9905, Department of Statistics, Colorado State University Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218 Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37 Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795 Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62(1):49–66 Krivitsky PN, Handcock MS, Raftery AE, Hoff PD (2009) Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc Netw 31(3):204–213 Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 1992:1350–1360 Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546 Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570–577 MATLAB (2011). version 7.12.0.635 (R2011a). Natick, Massachusetts: The MathWorks Inc. McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plan Inference 140(5):1175–1181 McNicholas PD, Browne RP (2013) Discussion of How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc: Ser C 62(3):352–353 McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2014) pgmm: Parsimonious Gaussian Mixture Models. R package version 1.1 McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296 McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712 Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441–458 Molitor J, Papathomas M, Jerrett M, Richardson S (2010) Bayesian profile regression with an application to the national survey of children’s health. Biostatistics 11(3):484–498 Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335 Qiu W, Joe H (2006) Generation of random clusters with specified degree of separation. J Classif 23:315–334 Qiu W, Joe H (2012) ClusterGeneration: random cluster generation (with specified degree of separation). R package version 1.2.9 R Core Team (2013) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266 Raftery AE, Madigan D, Hoeting JA (1998) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179–191 Raftery AE, Madigan D, Volinsky CT (1995) Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics, vol 5. Oxford University Press, Oxford, pp 323–349 Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850 Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464 Steinley D (2004) Properties of the Hubert-Arabie adjusted Rand index. Psychol Methods 9:386–396 Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc: Ser B 62:795–809 Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617 Volinsky CT, Madigan D, Raftery AE, Kronmal RA (1997) Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke. J R Stat Soc: Ser C 46(4):433–448 Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210 Wehrens R, Buydens LM, Fraley C, Raftery AE (2004) Model-based clustering for image segmentation and large datasets via sampling. J Classif 21:231–253 Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987