Variational Bayes estimation of hierarchical Dirichlet-multinomial mixtures for text clustering
Tóm tắt
In this paper, we formulate a hierarchical Bayesian version of the Mixture of Unigrams model for text clustering and approach its posterior inference through variational inference. We compute the explicit expression of the variational objective function for our hierarchical model under a mean-field approximation. We then derive the update equations of a suitable algorithm based on coordinate ascent to find local maxima of the variational target, and estimate the model parameters through the optimized variational hyperparameters. The advantages of variational algorithms over traditional Markov Chain Monte Carlo methods based on iterative posterior sampling are also discussed in detail.
Tài liệu tham khảo
Aggarwal CC, Zhai C (2012) Mining text data. Springer, New York. https://doi.org/10.1007/978-1-4614-3223-4
Airoldi EM, Blei D, Erosheva EA et al (2014) Handbook of mixed membership models and their applications. Chapman and Hall, Boca Raton. https://doi.org/10.1201/b17520
Anastasiu DC, Tagarelli A, Karypis G (2014) Document clustering: the next frontier. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman & Hall, Boca Raton, pp 305–338
Anderlucci L, Viroli C (2020) Mixtures of Dirichlet-multinomial distributions for supervised and unsupervised classification of short text data. Adv Data Anal Classif 14:759–770. https://doi.org/10.1007/s11634-020-00399-3
Andrews N, Fox E (2007) Recent developments in document clustering. http://hdl.handle.net/10919/19473, Virginia Tech computer science technical report, TR-07-35
Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12:233–251. https://doi.org/10.1145/183422.183423
Awasthi P, Risteski A (2015) On some provably correct cases of variational inference for topic models. In: Cortes C, Lawrence N, Lee D et al (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., New York
Baudry JP, Celeux G (2015) EM for mixtures. Inizialiation requires special care. Stat Comput 25:713–726. https://doi.org/10.1007/s11222-015-9561-x
Baudry JP, Maugis C, Michel B (2012) Slope heuristics: overview and implementation. Stat Comput 22:455–470. https://doi.org/10.1007/s11222-011-9236-1
Blanchard P, Higham DJ, Higham NJ (2021) Accurately computing the log-sum-exp and softmax functions. IMA J Numer Anal 41:2311–2330. https://doi.org/10.1093/imanum/draa038
Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84. https://doi.org/10.1145/2133806.2133826
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat. https://doi.org/10.1214/07-AOAS114
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112:859–877. https://doi.org/10.1080/01621459.2017.1285773
Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970. https://doi.org/10.1080/01621459.2000.10474285
Celeux G, Früwirth-Schnatter S, Robert CP (2018a) Model selection for mixture models—perspectives and strategies. In: Frühwirth-Schnatter S, Celeux G, Robert CP (eds) Handbook of mixture analysis. Chapmann & Hall, New York, pp 118–154. https://doi.org/10.1201/9780429055911
Celeux G, Kamary K, Malsiner-Walli G et al (2018b) Computational solutions for Bayesian inference in mixture models. In: Frühwirth-Schnatter S, Celeux G, Robert CP (eds) Handbook of mixture analysis. Chapmann & Hall, New York, pp 73–115. https://doi.org/10.1201/9780429055911
Chandra NK, Canale A, Dunson DB (2020) Escaping the curse of dimensionality in Bayesian model based clustering. arxiv:2006.02700
Dayton CM, Macready GB (1988) Concomitant-variable latent-class models. J Am Stat Assoc 83:173. https://doi.org/10.2307/2288938
Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175. https://doi.org/10.1023/A:1007612920971
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B (Methodol) 56:363–375. https://doi.org/10.1111/j.2517-6161.1994.tb01985.x
Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in R. J Stati Softw. https://doi.org/10.18637/jss.v025.i05
Feinerer I, Hornik K (2020) tm: text mining package. https://CRAN.R-project.org/package=tm, R package version 0.7-8
Frühwirth-Schnatter S (2004) Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econom J 7:143–167. https://doi.org/10.1111/j.1368-423X.2004.00125.x
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York. https://doi.org/10.1007/978-0-387-35768-3
Gelman A, Carlin J, Stern H et al (2013) Bayesian data analysis, 3rd edn. Chapman and Hall, Boca Raton
Ghahramani Z (2015) Probabilistic machine learning and artificial intelligence. Nature 521:452–459. https://doi.org/10.1038/nature14541
Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on machine learning (ICML’06). ACM Press, pp 377–384
Harris ZS (1954) Distributional structure. WORD 10:146–162. https://doi.org/10.1080/00437956.1954.11659520
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7
Hornik K, Feinerer I, Kober M et al (2012) Spherical \(k\)-means clustering. J Stat Softw. https://doi.org/10.18637/jss.v050.i10
Jordan MI, Ghahramani Z, Jaakkola TS et al (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233. https://doi.org/10.1023/A:1007665907178
Kaggle (2022) Sports dataset(bbc). https://www.kaggle.com/datasets/maneesh99/sports-datasetbbc. Accessed 04 Nov 2022
Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A (1961–2002) 62:49–66
Kunkel D, Peruggia M (2020) Anchored Bayesian Gaussian mixture models. Electron J Stat. https://doi.org/10.1214/20-EJS1756
Lee SY (2021) Gibbs sampler and coordinate ascent variational inference: a set-theoretical review. Commun Stat Theory Methods. https://doi.org/10.1080/03610926.2021.1921214
Li H, Fan X (2016) A pivotal allocation-based algorithm for solving the label-switching problem in Bayesian mixture models. J Comput Graph Stat 25:266–283. https://doi.org/10.1080/10618600.2014.983643
Maechler M (2022) Rmpfr: R mpfr—multiple precision floating-point reliable. https://cran.r-project.org/package=Rmpfr, R package version 0.8-9
Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26:303–324. https://doi.org/10.1007/s11222-014-9500-2
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Marin JM, Robert C (2008) Approximating the marginal likelihood in mixture models. Indian Bayesian Soc Newslett 5:2–7
Mosimann JE (1962) On the compound multinomial distribution, the multivariate Beta-distribution, and correlations among proportions. Biometrika 49:65–82. https://doi.org/10.1093/biomet/49.1-2.65
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge
Nielsen F, Garcia V (2009) Statistical exponential families: a digest with flash cards. arXiv:0911.4863
Nigam K, Mccallum AK, Thrun S et al (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39:103–134. https://doi.org/10.1023/A:1007692713085
Nikita M (2020) ldatuning: tuning of the latent Dirichlet allocation models parameters. https://CRAN.R-project.org/package=ldatuning, R package version 1.0.2
Plummer S, Pati D, Bhattacharya A (2020) Dynamics of coordinate ascent variational inference: a case study in 2D Ising models. Entropy 22:1263. https://doi.org/10.3390/e22111263
Pollice A, Bilancia M (2000) A hierarchical finite mixture model for Bayesian classification in the presence of auxiliary information. Metron Int J Stat LVIII:109–131
R Core Team (2022) R: a language and environment for statistical computing. https://www.R-project.org/
Rakib MRH, Zeh N, Jankowska M et al (2020) Enhancement of short text clustering by iterative classification. In: Métais E, Meziane F, Horacek H et al (eds) Natural language processing and information systems. Springer, Berlin, pp 105–117. https://doi.org/10.1007/978-3-030-51310-8_10
Robert CP (2007) The Bayesian choice. Springer, New York. https://doi.org/10.1007/0-387-71599-1
Sankaran K, Holmes SP (2019) Latent variable modeling for the microbiome. Biostatistics 20:599–614. https://doi.org/10.1093/biostatistics/kxy018
Silverman J (2022) RcppHungarian: solves minimum cost bipartite matching problems. https://CRAN.R-project.org/package=RcppHungarian, R package version 0.2
Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc Ser B (Stat Methodol) 62:795–809. https://doi.org/10.1111/1467-9868.00265
Titterington DM, Wang B (2006) Convergence properties of a general algorithm for calculating variational Bayesian estimates for a Normal mixture model. Bayesian Anal. https://doi.org/10.1214/06-BA121
Tran MN, Nguyen TN, Dao VH (2021) A practical tutorial on variational Bayes. arXiv:2103.01327
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Wainwright MJ, Jordan MI (2007) Graphical models, exponential families, and variational inference. Found Trends® Mach Learn 1:1–305. https://doi.org/10.1561/2200000001
Wallach H, Mimno D, McCallum A (2009) Rethinking LDA: why priors matter. In: Bengio Y, Schuurmans D, Lafferty J et al (eds) Advances in neural information processing systems, vol 22. Curran Associates Inc., New York
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2:165–193. https://doi.org/10.1007/s40745-015-0040-1
Zhang C, Butepage J, Kjellstrom H et al (2019) Advances in variational inference. IEEE Trans Pattern Anal Mach Intell 41:2008–2026. https://doi.org/10.1109/TPAMI.2018.2889774
Zhang C, Kjellström H (2015) How to supervise topic models. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCv 2014 workshops. Springer, Cham, pp 500–515. https://doi.org/10.1007/978-3-319-16181-5_39