Scale-constrained approaches for maximum likelihood estimation and model selection of clusterwise linear regression models

Journal of the Italian Statistical Society - Tập 29 - Trang 49-78 - 2019

Roberto Di Mari¹, Roberto Rocci², Stefano Antonio Gattone³

¹Department of Economics and Business, University of Catania, Catania, Italy

²Department of Economics and Finance, University of Rome Tor Vergata, Rome, Italy

³Department of Philosophical and Social Sciences, Economics and Quantitative Methods, University G. d'Annunzio, Chieti-Pescara, Italy

Tóm tắt

We consider an equivariant approach imposing data-driven bounds for the variances to avoid singular and spurious solutions in maximum likelihood estimation of clusterwise linear regression models. We investigate its use in the choice of the number of components and we propose a computational shortcut, which significantly reduces the computational time needed to tune the bounds on the data. In the simulation study and the two real-data applications, we show that the proposed methods guarantee a reliable assessment of the number of components compared to standard unconstrained methods, together with accurate model parameters estimation and cluster recovery.

Tài liệu tham khảo

Alfó M, Viviani S (2016) Finite mixtures of structured models. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman & Hall, Boca Raton, pp 217–240 Arlot S, Celisse A (2010) Cross-validation procedures for model selection. Stat Surv 4:40–79 Bagirov AM, Ugon J, Mirzayeva H (2013) Nonsmooth nonconvex optimization approach to clusterwise linear regression problems. Eur J Oper Res 229(1):132–142 Carbonneau RA, Caporossi G, Hansen P (2011) Globally optimal clusterwise regression by mixed logical-quadratic programming. Eur J Oper Res 212(1):213–222 Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of groups in model-based clustering via constrained likelihoods. J Comput Graph Stat. https://doi.org/10.1080/10618600.2017.1390469 Day NE (1969) Estimating the components of a mixture of two normal distributions. Biometrika 56:463–474 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Stat Methodol) 39:1–38 Di Mari R, Rocci R, Gattone SA (2017) Clusterwise linear regression modeling with soft scale constraints. Int J Approx Reason 91:160–178 Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181 García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2017) Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif. https://doi.org/10.1007/s11634-017-0293-y Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800 Hennig C, Liao TF (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc Ser C 62(3):309–369 Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218 Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13:151–166 Ingrassia S, Rocci R (2007) A constrained monotone EM algorithm for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51:5339–5351 Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā 62:49–66 Kiefer NM (1978) Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46:427–434 Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:886–906 Kim D, Seo B (2014) Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers. J Multivar Anal 125:100–120 Koehler AB, Murphree ES (1988) A comparison of the Akaike and Schwarz criteria for selecting model order. Appl Stat 37:187–195 Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20:1350–1360 McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67(338):306–310 Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73(364):730–738 Ritter G (2014) Robust cluster analysis and variable selection. Monographs on statistics and applied probability, vol 137. CRC Press Rocci R, Gattone SA, Di Mari R (2017) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif. https://doi.org/10.1007/s11634-016-0279-1 Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470 Seo B, Lindsay BG (2010) A computational strategy for doubly smoothed MLE exemplified in the normal mixture model. Comput Stat Data Anal 54(8):1930–1941 Smyth P (1996) Clustering using Monte-Carlo cross validation. In: Proceedings of the second international conference on knowledge discovery and data mining, Menlo Park, CA, AAAI Press, pp 126–133 Smyth P (2000) Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1):63–72 Zou H, Hastie T, Tibshirani R (2007) On the “degrees of freedom” of the lasso. Ann Stat 35(5):2173–2192

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA