Asymptotic accuracy of Bayes estimation for latent variables with redundancy

Machine Learning - Tập 102 - Trang 1-28 - 2015
Keisuke Yamazaki1
1Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan

Tóm tắt

Hierarchical parametric models consisting of observable and latent variables are widely used for unsupervised learning tasks. For example, a mixture model is a representative hierarchical model for clustering. From the statistical point of view, the models can be regular or singular due to the distribution of data. In the regular case, the models have the identifiability; there is one-to-one relation between a probability density function for the model expression and the parameter. The Fisher information matrix is positive definite, and the estimation accuracy of both observable and latent variables has been studied. In the singular case, on the other hand, the models are not identifiable and the Fisher matrix is not positive definite. Conventional statistical analysis based on the inverse Fisher matrix is not applicable. Recently, an algebraic geometrical analysis has been developed and is used to elucidate the Bayes estimation of observable variables. The present paper applies this analysis to latent-variable estimation and determines its theoretical performance. Our results clarify behavior of the convergence of the posterior distribution. It is found that the posterior of the observable-variable estimation can be different from the one in the latent-variable estimation. Because of the difference, the Markov chain Monte Carlo method based on the parameter and the latent variable cannot construct the desired posterior distribution.

Tài liệu tham khảo

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transaction on Automatic Control, 19, 716–723. Allman, E., Matias, C., & Rhodes, J. (2009). Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37, 3099–3132. Aoyagi, M. (2010). Stochastic complexity and generalization error of a restricted boltzmann machine in Bayesian estimation. Journal of Machine Learning Research, 11, 1243–1272. Aoyagi, M., & Watanabe, S. (2005). Stochastic complexities of reduced rank regression in Bayesian estimation. Neural Networks, 18, 924–933. Atiyah, M. F. (1970). Resolution of singularities and division of distributions. Communications on Pure and Applied Mathematics, 23, 145–150. Dawid, A. P., & Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics, 21(3), 1272–1317. Ghosal, S., Ghosh, J. K., & Vaart, A. W. V. D. (2000). Convergence rates of posterior distributions. Annals of Statistics, 28, 500–531. Heckerman, D. (1999). Learning in graphical models. In M. I. Jordan (Ed.), A tutorial on learning with Bayesian networks (pp. 301–354). Cambridge, MA, USA: MIT Press. Hironaka, H. (1964). Resolution of singularities of an algebraic variety over a field of characteristic zero I. Annals of Mathematics, 79(1), 109–203. Ibragimov, I. A., & Has’ Minskii, R. Z. (1981). Statistical estimation-asymptotic theory (Vol. 16). Berlin: Springer. Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Annals of Statistics, 1, 38–53. Nagata, K., & Watanabe, S. (2009). Design of exchange monte carlo method for Bayesian learning in normal mixture models. In: Proceedings of the 15th international conference on advances in neuro-information processing—Volume Part I (pp. 696–706). Berlin, Heidelberg: Springer, ICONIP’08. Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. Annals of Statistics, 41, 370–400. Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14, 1080–1100. Robert, C. P., & Casella, G. (2005). Monte Carlo statistical methods (Springer texts in statistics). Secaucus, NJ: Springer New York Inc. Rusakov, D., & Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6, 1–35. Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. Watanabe, S. (2001). Algebraic analysis for non-identifiable learning machines. Neural Computation, 13(4), 899–933. Watanabe, S. (2009). Algebraic geometry and statistical learning theory. New York, NY: Cambridge University Press. Yamazaki K (2014) Asymptotic accuracy of distribution-based estimation for latent variables. Journal of Machine Learning Research, 13, 3541–3562. Yamazaki, K., & Kaji, D. (2013). Comparing two Bayes methods based on the free energy functions in Bernoulli mixtures. Neural Networks, 44C, 36–43. Yamazaki, K., & Watanabe, S. (2003a). Singularities in mixture models and upper bounds of stochastic complexity. International Journal of Neural Networks, 16, 1029–1038. Yamazaki, K., & Watanabe, S. (2003b). Stochastic complexity of Bayesian networks. In Proceedings of UAI, pp. 592–599. Yamazaki, K., & Watanabe, S. (2005a). Algebraic geometry and stochastic complexity of hidden Markov models. Neurocomputing, 69(1–3), 62–84. Yamazaki, K., & Watanabe, S. (2005b). Singularities in complete bipartite graph-type Boltzmann machines and upper bounds of stochastic complexities. IEEE Transactions on Neural Networks, 16(2), 312–324. Yamazaki, K., Aoyagi, M., & Watanabe, S. (2010). Asymptotic analysis of Bayesian generalization error with Newton diagram. Neural Networks, 23, 35–43. Zwiernik, P. (2011). An asymptotic behaviour of the marginal likelihood for general Markov models. Journal of Machine Learning Research, 999888, 3283–3310.