Entropic risk minimization for nonparametric estimation of mixing distributions

Machine Learning - Tập 99 - Trang 119-136 - 2014

Kazuho Watanabe¹, Shiro Ikeda²

¹Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Japan

²The Institute of Statistical Mathematics, Tachikawa-shi, Japan

Tóm tắt

We discuss a nonparametric estimation method for the mixing distributions in mixture models. The problem is formalized as a minimization of a one-parameter objective functional, which becomes the maximum likelihood estimation or the kernel vector quantization in special cases. Generalizing the theorem for the nonparametric maximum likelihood estimation, we prove the existence and discreteness of the optimal mixing distribution and provide an algorithm to calculate it. It is demonstrated that with an appropriate choice of the parameter, the proposed method is less prone to overfitting than the maximum likelihood method. We further discuss the connection between the unifying estimation framework and the rate-distortion problem.

Tài liệu tham khảo

Amari, S., Fujita, N., & Shinomoto, S. (1992). Four types of learning curves. Neural Computation, 4(4), 605–618. Arikan, E., & Merhav, N. (1998). Guessing subject to distortion. IEEE Transactions on Information Theory, 44(3), 1041–1056. Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749. Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge: Cambridge University Press. Barron, A. R., Roos, T., Watanabe, K. (2014). Bayesian properties of normalized maximum likelihood and its fast computation. In Proceedings of the 2014 IEEE International Symposium on Information Theory (pp. 1667–1671). Basu, A., Harris, I. R., Hjort, N. L., & Jones, M. C. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3), 549–559. Berger, T. (1971). Rate distortion theory: A mathematical basis for data compression. Englewood Cliffs, NJ: Prentice-Hall. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39–B, 1–38. Eguchi, S., & Kato, S. (2010). Entropy and divergence associated with power function and the statistical application. Entropy, 12, 262–274. Eguchi, S., Komori, O., & Kato, S. (2011). Projective power entropy and maximum Tsallis entropy distributions. Entropy, 13, 1746–1764. Fujisawa, H., & Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99(9), 2053–2081. Hartigan, J. A. (1985). A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Berkeley Conference in Honor of J. Neyman and J. Kiefer (Vol. 2, pp. 807–810). Lashkari, D., Golland, P. (2007). Convex clustering with exemplar-based models. In Advances in neural information processing systems 19. Lindsay, B. G. (1983). The geometry of mixture likelihoods: A general theory. The Annals of Statistics, 11(1), 86–94. Lindsay, B. G. (1995). Mixture models: Theory geometry and applications. Hayward, CA: Institute of Mathematical Statistics. Murata, N., Takenouchi, T., Kanamori, T., & Eguchi, S. (2004). Information geometry of U-boost and Bregman divergence. Neural Computation, 16(7), 1437–1481. Nowozin, S., Bakir, G. (2008). A decoupled approach to exemplar-based unsupervised learning. In Proceedings of the 24th International Conference on Machine Learning (ICML). Renyi, A. (1961). On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 547–561). University of California Press, Berkeley. Rose, K. (1994). A mapping approach to rate-distortion computation and analysis. IEEE Transactions on Information Theory, 40(6), 1939–1952. Rudloff, B., Sass, J., & Wunderlich, R. (2008). Entropic risk constraints for utility maximization. In C. Tammer & F. Heyde (Eds.), Festschrift in celebration of Prof. Dr. Wilfried Grecksch’s 60th Birthday (pp. 149–180). Aachen: Shaker. Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K. R., Ratsch, G., et al. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10, 1000–1017. Tipping, M. & Schölkopf, B. (2001). A kernel approach for vector quantization with guaranteed distortion bounds. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). Tsallis, C. (2009). Introduction to nonextensive statistical mechanics. New York: Springer. Watanabe, S. (2005). Algebraic geometry of singular learning machines and symmetry of generalization and training errors. Neurocomputing, 67, 198–213.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA