Entropy regularization in probabilistic clustering

Beatrice Franzolini1, Giovanni Rebaudo2
1Department of Decision Sciences, Bocconi University, Milan, Italy
2University of Turin & Collegio Carlo Alberto, Turin, Italy

Tóm tắt

Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters’ frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalized likelihood, we show how the unbalance can be explained as a direct consequence of the cost functions involved in estimating the partition. In light of our findings, we propose a novel Bayesian estimator of the clustering configuration. The proposed estimator is equivalent to a post-processing procedure that reduces the number of sparsely-populated clusters and enhances interpretability. The procedure takes the form of entropy-regularization of the Bayesian estimate. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific model used.

Tài liệu tham khảo

Andersen EB (1982) Latent structure analysis: a survey. Scand J Stat 9:1–12 Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2:1152–1174 Argiento R, De Iorio M (2022) Is infinity that far? A Bayesian nonparametric perspective of finite mixture models. Ann Stat 50:2641–2663 Ascolani F, Lijoi A, Rebaudo G, Zanella G (2023) Clustering consistency with Dirichlet process mixtures. Biometrika 110:551–558 Balocchi C, George EI, Jensen ST (2023) Clustering areal units at multiple levels of resolution to model crime incidence in Philadelphia. Preprint at arXiv 2112:02059 Bartolucci F, Farcomeni A, Scaccia L (2017) A nonparametric multidimensional latent class IRT model in a Bayesian framework. Psychometrika 82:952–978 Beraha M, Guglielmi A, Quintana FA (2021) The semi-hierarchical Dirichlet process and its application to clustering homogeneous distributions. Bayesian Anal 16:1187–1219 Beraha M, Argiento R, Möller J, Guglielmi A (2022) MCMC computations for Bayesian mixture models using repulsive point processes. J Comput Graph Stat 31:422–435 Betancourt B, Zanella G, Steorts RC (2022) Random partition models for microclustering tasks. J Am Stat Assoc 117:1215–1227 Bianchini I, Guglielmi A, Quintana FA (2020) Determinantal point process mixtures via spectral density approach. Bayesian Anal 15:187–214 Binder DA (1978) Bayesian cluster analysis. Biometrika 65:31–38 Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022 Camerlenghi F, Lijoi A, Prünster I (2018) Bayesian nonparametric inference beyond the Gibbs-type framework. Scand J Stat 45:1062–1091 Camerlenghi F, Dunson DB, Lijoi A, Prünster I, Rodríguez A (2019) Latent nested nonparametric priors (with discussion). Bayesian Anal 14:1303–1356 Casella G, Moreno E, Girón FJ (2014) Cluster analysis, model selection, and prior distributions on models. Bayesian Anal 9:613–658 Dahl DB, Day R, Tsai JW (2017) Random partition distribution indexed by pairwise information. J Am Stat Assoc 112:721–732 Dahl DB, Johnson DJ, Müller P (2022a) Salso: search algorithms and loss functions for Bayesian clustering. R package version 0.3.29 Dahl DB, Johnson DJ, Müller P (2022b) Search algorithms and loss functions for Bayesian clustering. J Comput Graph Stat 31:1189–1201 De Blasi P, Favaro S, Lijoi A, Mena RH, Prünster I, Ruggiero M (2015) Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans Pattern Anal Mach Intell 37:212–229 Denti F, Camerlenghi F, Guindani M, Mira A (2023) A common atom model for the Bayesian nonparametric analysis of nested data. J Am Stat Assoc 118:405–416 Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90:577–588 Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. In Recent Advances in Statistics, pages 287–302. Elsevier Franzolini B, Cremaschi A, van den Boom W, De Iorio M (2023) Bayesian clustering of multiple zero-inflated outcomes. Philos Trans R Soc A 381:1–16 Franzolini B, De Iorio M, Eriksson J (2023) Conditional partial exchangeability: a probabilistic framework for multi-view clustering. Preprint at arXiv 2307:01152 Franzolini B, Lijoi A, Prünster I (2023) Model selection for maternal hypertensive disorders with symmetric hierarchical Dirichlet processes. Ann Appl Stat 17:313–332 Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian Data Anal. Chapman and Hall/CRC Gil-Leyva MF, Mena RH, Nicoleris T (2020) Beta-Binomial stick-breaking non-parametric prior. Electron J Stat 14:1479–1507 Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28:355–375 Greve J, Grün B, Malsiner-Walli G, Frühwirth-Schnatter S (2022) Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis. Aust N Z J Stat 64:205–229 Hennig C (2015) What are the true clusters? Pattern Recognit. Letters 64:53–62 Koo W, Kim H (2020) Bayesian nonparametric latent class model for longitudinal data. Stat Methods Med Res 29:3381–3395 Lazarsfeld PF (1955) Recent developments in latent structure analysis. Sociometry 18:391–403 Lee, C. J. and Sang, H. (2022). Why the rich get richer? On the balancedness of random partition models. In Int Conf Mach Learn, pages 12521–12541 Lee J, Müller P, Zhu Y, Ji Y (2013) A nonparametric Bayesian model for local clustering with application to proteomics. J Am Stat Assoc 108:775–788 Li Y, Lord-Bessen J, Shiyko M, Loeb R (2018) Bayesian latent class analysis tutorial. Multivar Behav Res 53:430–451 Lijoi A, Prünster I (2010) Models beyond the Dirichlet process. In: Hjort NL, Holmes C, Müller P, Walker SG (eds) Bayesian Nonparametrics. Cambridge Univ, Press Lijoi A, Mena RH, Prünster I (2007) Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat Soc Ser B Stat Methodol 69:715–740 Lijoi A, Prünster I, Rebaudo G (2023) Flexible clustering via hidden hierarchical Dirichlet priors. Scand J Stat 50:213–234 Lin Q, Rebaudo G, Müller P (2021) Separate exchangeability as modeling principle in Bayesian nonparametrics. Preprint at arXiv 2112:07755 Lindsay BG (1995) Mixture models: theory, geometry, and applications. In NSF-CBMS Regional Conf. Ser Prob Stat 5:1–165 Liu JS (1996) Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat Comput 6:113–119 Lo AY (1984) On a class of Bayesian nonparametric estimates: I. density estimates. Ann Stat 12:351–357 McCutcheon AL (1987) Latent Class Analysis. SAGE PublicationS McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annu Rev Stat Appl 6:355–378 Meilă M (2007) Comparing clusterings-an information based distance. J Multivar Anal 98:873–895 Miller JW, Harrison MT (2018) Mixture models with a prior on the number of components. J Am Stat Assoc 113:340–356 Müller P, Quintana FA, Rosner GL (2011) A product partition model with regression on covariates. J Comput Graph Stat 20:260–278 Ngan HYT, Yung NHC, Yeh AGO (2015) Outlier detection in traffic data based on the Dirichlet process mixture model. IET Intell Transp Syst 9:773–781 Nobile, A. (1994). Bayesian Analysis of Finite Mixture Distributions. Ph.D. thesis, Carnegie Mellon Univ Paganin S, Herring AH, Olshan AF, Dunson DB (2021) Centered partition processes: informative priors for clustering (with discussion). Bayesian Anal 16:301–370 Page GL, Quintana FA, Müller P (2022) Clustering and prediction with variable dimension covariates. J Comput Graph Stat 31:466–476 Page GL, Quintana FA, Dahl DB (2022) Dependent modeling of temporal sequences of random partitions. J Comput Graph Stat 31:614–627 Petralia F, Rao V, Dunson DB (2012) Repulsive mixtures. In Adv Neural Inf Process Syst 25:1889–1897 Petrone S, Guindani M, Gelfand AE (2009) Hybrid Dirichlet mixture models for functional data. J. R. Stat Soc Ser B Stat Methodol 71:755–782 Pitman J (1996) Some developments of the Blackwell-MacQueen urn scheme. Lect Notes-Monogr Ser 30:245–267 Pitman J, Yor M (1997) The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann Probab 25:855–900 Polyanskiy Y, Wu Y (2020) Self-regularizing property of nonparametric maximum likelihood estimator in mixture models. Preprint at arXiv 2008:08244 Qiu M, Paganin S, Ohn I, Lin L (2023) Bayesian nonparametric latent class analysis for different item types. Multivar Behav Res 58:156–157 Rastelli R, Friel N (2018) Optimal Bayesian estimators for latent variable cluster models. Stat Comput 28:1169–1186 Rebaudo G, Müller P (2023) Graph-aligned random partition model (GARP). Preprint at arXiv 2306:08485 Robert CP (2007) The Bayesian Choice: from Decision-Theoretic Foundations to Computational Implementation, vol 2. Springer, Berlin Rodríguez A, Dunson DB, Gelfand AE (2008) The nested Dirichlet process (with discussion). J Am Stat Assoc 103:1131–1154 Saha S, Guntuboyina A (2020) On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising. Ann Stat 48:738–762 Savage LJ (1972) The Foundations of Statistics. Dover Publications, New York Shotwell MS, Slate EH (2011) Bayesian outlier detection with Dirichlet process mixtures. Bayesian Anal 6:665–690 Stephens M (2000) Dealing with label switching in mixture models. J. R. Stat Soc Ser B Stat Methodol 62:795–809 Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581 Wade S, Ghahramani Z (2018) Bayesian cluster analysis: point estimation and credible balls (with discussion). Bayesian Anal 13:559–626 Wallach H, Jensen S, Dicker L, Heller K (2010) An alternative prior process for nonparametric Bayesian clustering. In Proc. Int Conf Artif Intell Stat 9:892–899 White A, Murphy TB (2014) BayesLCA: an R package for Bayesian latent class analysis. J Stat Softw 61:1–28 Xie F, Xu Y (2020) Bayesian repulsive Gaussian mixture model. J Am Stat Assoc 115:187–203 Xu Y, Müller P, Telesca D (2016) Bayesian inference for latent biologic structure with determinantal point processes (DPP). Biometrics 72:955–964 Zanella, G., Betancourt, B., Wallach, H., Miller, J., Zaidi, A., and Steorts, R. C. (2016). Flexible models for microclustering with application to entity resolution. In Advanced Neural Information and Process Syst, pages 1417–1425 Zhang, C., Qin, Y., Zhu, X., Zhang, J., and Zhang, S. (2006). Clustering-based missing value imputation for data preprocessing. In IEEE Int. Conf Industr Inform, pages 1081–1086 Zuanetti DA, Müller P, Zhu Y, Yang S, Ji Y (2018) Clustering distributions with the marginalized nested Dirichlet process. Biometrics 74:584–594