From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering
Tóm tắt
In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al. (Stat Comput 26:303–324, 2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with K components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than K with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.
Tài liệu tham khảo
Aitkin M (1996) A general maximum likelihood analysis of overdispersion in generalized linear models. Stat Comput 6:251–262
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
Azzalini A (1986) Further results on a class of distributions which includes the normal ones. Statistica 46:199–208
Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc Ser B 65:367–389
Azzalini A, Dalla Valle A (1996) The multivariate skew normal distribution. Biometrika 83:715–726
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Bennett DA, Schneider JA, Buchman AS, de Leon CM, Bienias JL, Wilson RS (2005) The rush memory and aging project: study design and baseline characteristics of the study cohort. Neuroepidemiology 25:163–175
Bensmail H, Celeux G, Raftery AE, Robert CP (1997) Inference in model-based cluster analysis. Stat Comput 7:1–10
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725
Celeux G, Forbes F, Robert CP, Titterington DM (2006) Deviance information criteria for missing data models. Bayesian Anal 1:651–674
Celeux G, Frühwirth-Schnatter S, Robert CP (2018) Model selection for mixture models—perspectives and strategies. In: Frühwirth-Schnatter S, Celeux G, Robert CP (eds) Handbook of mixture analysis, chapter 7. CRC Press, Boca Raton, pp 121–160
Clogg CC, Goodman LA (1984) Latent structure analysis of a set of multidimensional contincency tables. J Am Stat Assoc 79:762–771
Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16:57–68
Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90:577–588
Escobar MD, West M (1998) Computing nonparametric hierarchical models. In: Dey D, Müller P, Sinha D (eds) Practical nonparametric and semiparametric Bayesian statistics, number 133 in lecture notes in statistics. Springer, Berlin, pp 1–22
Fall MD, Barat É (2014) Gibbs sampling methods for Pitman-Yor mixture models. Working paper https://hal.archives-ouvertes.fr/hal-00740770/file/Fall-Barat.pdf
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Ferguson TS (1974) Prior distributions on spaces of probability measures. Ann Stat 2:615–629
Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. In: Rizvi MH, Rustagi JS (eds) Recent advances in statistics: papers in honor of Herman Chernov on his sixtieth birthday. Academic Press, New York, pp 287–302
Frühwirth-Schnatter S (2004) Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econom J 7:143–167
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
Frühwirth-Schnatter S (2011a) Dealing with label switching under model uncertainty. In: Mengersen K, Robert CP, Titterington D (eds) Mixture estimation and applications, chapter 10. Wiley, Chichester, pp 213–239
Frühwirth-Schnatter S (2011b) Label switching under model uncertainty. In: Mengersen K, Robert CP, Titterington D (eds) Mixtures: estimation and application. Wiley, Hoboken, pp 213–239
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11:317–336
Frühwirth-Schnatter S, Wagner H (2008) Marginal likelihoods for non-Gaussian models using auxiliary mixture sampling. Comput Stat Data Anal 52:4608–4624
Frühwirth-Schnatter S, Frühwirth R, Held L, Rue H (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19:479–492
Frühwirth-Schnatter S, Celeux G, Robert CP (eds) (2018) Handbook of mixture analysis. CRC Press, Boca Raton
Goodman LA (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215–231
Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28:355–375
Grün B (2018) Model-based clustering. In: Frühwirth-Schnatter S, Celeux G, Robert CP (eds) Handbook of mixture analysis, chapter 8. CRC Press, Boca Raton, pp 163–198
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96:161–173
Kalli M, Griffin JE, Walker SG (2011) Slice sampling mixture models. Stat Comput 21:93–105
Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā A 62:49–66
Lau JW, Green P (2007) Bayesian model-based clustering procedures. J Comput Graph Stat 16:526–558
Lazarsfeld PF, Henry NW (1968) Latent structure analysis. Houghton Mifflin, New York
Lee S, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454
Linzer DA, Lewis JB (2011) polca: an R package for polytomous variable latent class analysis. J Stat Softw 42(10):1–29
Malsiner Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26:303–324
Malsiner Walli G, Frühwirth-Schnatter S, Grün B (2017) Identifying mixtures of mixtures using Bayesian estimation. J Comput Graph Stat 26:285–295
Malsiner-Walli G, Pauger D, Wagner H (2018) Effect fusion using model-based clustering. Stat Model 18:175–196
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics. Wiley, New York
Medvedovic M, Yeung KY, Bumgarner RE (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222–1232
Miller JW, Harrison MT (2013) A simple example of Dirichlet process mixture inconsistency for the number of components. In: Advances in neural information processing systems, pp 199–206
Miller JW, Harrison MT (2018) Mixture models with a prior on the number of components. J Am Stat Assoc 113:340–356
Müller P, Mitra R (2013) Bayesian nonparametric inference—why and how. Bayesian Anal 8:269–360
Nobile A (2004) On the posterior distribution of the number of components in a finite mixture. Ann Stat 32:2044–2073
Papaspiliopoulos O, Roberts G (2008) Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95:169–186
Polson NG, Scott JG, Windle J (2013) Bayesian inference for logistic models using Pólya-Gamma latent variables. J Am Stat Assoc 108:1339–49
Quintana FA, Iglesias PL (2003) Bayesian clustering and product partition models. J R Stat Soc Ser B 65:557–574
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59:731–792
Rousseau J, Mengersen K (2011) Asymptotic behaviour of the posterior distribution in overfitted mixture models. J R Stat Soc Ser B 73:689–710
Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639–650
Stern H, Arcus D, Kagan J, Rubin DB, Snidman N (1994) Statistical choices in infant temperament research. Behaviormetrika 21:1–17
van Havre Z, White N, Rousseau J, Mengersen K (2015) Overfitting Bayesian mixture models with an unknown number of components. PLoS ONE 10(7):e0131739, 1–27
Viallefont V, Richardson S, Green PJ (2002) Bayesian analysis of Poisson mixtures. J Nonparametr Stat 14:181–202