Bayesian sparse convex clustering via global-local shrinkage priors

Computational Statistics - Tập 36 - Trang 2671-2699 - 2021
Kaito Shimamura1,2, Shuichi Kawano2
1NTT Advanced Technology Corporation, Kawasaki-shi, Japan
2Graduate School of Informatics and Engineering, The University of Electro-Communications, Chofu-shi, Japan

Tóm tắt

Sparse convex clustering is to group observations and conduct variable selection simultaneously in the framework of convex clustering. Although a weighted $$L_1$$ norm is usually employed for the regularization term in sparse convex clustering, its use increases the dependence on the data and reduces the estimation accuracy if the sample size is not sufficient. To tackle these problems, this paper proposes a Bayesian sparse convex clustering method based on the ideas of Bayesian lasso and global-local shrinkage priors. We introduce Gibbs sampling algorithms for our method using scale mixtures of normal distributions. The effectiveness of the proposed methods is shown in simulation studies and a real data analysis.

Tài liệu tham khảo

Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc B 36(1):99–102 Bhadra A, Datta J, Polson NG, Willard B (2019) Lasso meets horseshoe: a survey. Stat Sci 34(3):405–427 Bhadra A, Datta J, Polson NG, Willard B et al (2017) The horseshoe+ estimator of ultra-sparse signals. Bayesian Analysis 12(4):1105–1131 Bhattacharya A, Pati D, Pillai NS, Dunson DB (2015) Dirichlet–Laplace priors for optimal shrinkage. J Am Stat Assoc 110(512):1479–1490 Brown PJ, Griffin JE (2010) Inference with normal-gamma prior distributions in regression problems. Bayesian Anal 5(1):171–188 Cadonna A, Frühwirth-Schnatter S, Knaus P (2020) Triple the gamma—a unifying shrinkage prior for variance and variable selection in sparse state space and TVP models. Econometrics 8(2):20 Carvalho CM, Polson NG, Scott JG (2010) The horseshoe estimator for sparse signals. Biometrika 97(2):465–480 Chandra NK, Canale A, Dunson DB (2020) Bayesian clustering of high-dimensional data. arXiv preprint arXiv:2006.02700 Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499 Frühwirth-Schnatter S, Malsiner-Walli G (2019) From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering. Adv Data Anal Classif 13(1):33–64 Griffin JE, Brown PJ (2005) Alternative prior distributions for variable selection with very many more variables than observations. University of Kent Technical Report Griffin JE, Brown PJ (2011) Bayesian hyper-lassos with non-convex penalization. Aust N Z J Stat 53(4):423–442 Hartigan JA, Wong MA (1979) A k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108 Hocking TD, Joulin A, Bach F, Vert J-P (2011) Clusterpath : an algorithm for clustering using convex fusion penalties. In: Proceedings of the 28th international conference on machine learning (ICML) Johndrow JE, Orenstein P, Bhattacharya A (2020) Scalable approximate MCMC algorithms for the horseshoe prior. J Mach Learn Res 21(73):1–61 Lichman M (2013) UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml Makalic E, Schmidt DF (2015) A simple sampler for the horseshoe estimator. IEEE Signal Process Lett 23(1):179–182 Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26(1–2):303–324 McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annu Rev Stat Appl 6:355–378 Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686 Piironen J, Vehtari A et al (2017) Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron J Stat 11(2):5018–5051 Polson NG, Scott JG (2010) Shrink globally, act locally: Sparse Bayesian regularization and prediction. Bayesian Stat 9:501–538 Ray K, Szabó B (2020) Variational Bayes for high-dimensional linear regression with sparse priors. J Am Stat Assoc 1–31 Rigon T, Herring AH, Dunson DB (2020) A generalized Bayes framework for probabilistic clustering. arXiv preprint arXiv:2006.05451 Shimamura K, Ueki M, Kawano S, Konishi S (2019) Bayesian generalized fused lasso modeling via neg distribution. Commun Stat Theory Methods 48(16):4132–4153 Van Erp S, Oberski DL, Mulder J (2019) Shrinkage priors for Bayesian penalized regression. J Math Psychol 89:31–50 Wade S, Ghahramani Z et al (2018) Bayesian cluster analysis: point estimation and credible balls (with discussion). Bayesian Anal 13(2):559–626 Wang B, Zhang Y, Sun WW, Fang Y (2018) Sparse convex clustering. J Comput Graph Stat 27(2):393–403 Wang Y, Blei DM (2019) Frequentist consistency of variational Bayes. J Am Stat Assoc 114(527):1147–1161 Xu X, Ghosh M (2015) Bayesian variable selection and estimation for group lasso. Bayesian Anal 10(4):909–936 Yau C, Holmes C (2011) Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. Bayesian Anal (Online) 6(2):329 Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68(1):49–67 Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429