Best subset selection via cross-validation criterion

Top - Tập 28 - Trang 475-488 - 2020
Yuichi Takano1, Ryuhei Miyashiro2
1Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba-shi, Japan
2Institute of Engineering, Tokyo University of Agriculture and Technology, Koganei-shi, Japan

Tóm tắt

This paper is concerned with the cross-validation criterion for selecting the best subset of explanatory variables in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ $$C_p$$, the Akaike information criterion, and the Bayesian information criterion), cross-validation requires only mild assumptions, namely, that samples are identically distributed and that training and validation samples are independent. For this reason, the cross-validation criterion is expected to work well in most situations involving predictive methods. The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion. This subset-selection problem can be formulated as a bilevel MIO problem. We then reduce it to a single-level mixed-integer quadratic optimization problem, which can be solved exactly by using optimization software. The efficacy of our method is evaluated through simulation experiments by comparison with statistical-criterion-based exhaustive search algorithms and $$L_1$$-regularized regression. Our simulation results demonstrate that, when the signal-to-noise ratio was low, our method delivered good accuracy for both subset selection and prediction.

Tài liệu tham khảo

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723 Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1):125–127 Arthanari TS, Dodge Y (1981) Mathematical programming in statistics. Wiley, New York Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79 Benati S, García S (2014) A mixed integer linear model for clustering with variable selection. Comput Oper Res 43:280–285 Bennett KP, Hu J, Ji X, Kunapuli G, Pang JS (2006) Model selection via bilevel optimization. In: Proceedings of the 2006 IEEE international joint conference on neural networks, pp 1922–1929 Bertsimas D, King A (2016) OR forum—an algorithmic approach to linear regression. Oper Res 64(1):2–16 Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852 Bertsimas D, Dunn J (2017) Optimal classification trees. Mach Learn 106(7):1039–1082 Bertsimas D, King A (2017) Logistic regression: from art to science. Stat Sci 32(3):367–384 Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge Chung S, Park YW, Cheong T (2017) A mathematical programming approach for integrated multiple linear regression subset selection and validation. arXiv preprint arXiv:1712.04543 Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256 Cozad A, Sahinidis NV, Miller DC (2014) Learning surrogate models for simulation-based optimization. AIChE J 60(6):2211–2227 Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22 Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70(350):320–328 Hastie T, Tibshirani R, Tibshirani RJ (2017) Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692 Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67 Hooker JN, Osorio MA (1999) Mixed logical-linear programming. Discrete Appl Math 96–97:395–442 Kimura K, Waki H (2018) Minimization of Akaike’s information criterion in linear regression analysis via mixed integer nonlinear program. Optim Methods Softw 33(3):633–649 Konno H, Yamamoto R (2009) Choosing the best set of variables in regression analysis using integer programming. J Glob Optim 44(2):273–282 Kunapuli G, Bennett KP, Hu J, Pang JS (2008) Classification model selection via bilevel programming. Optim Methods Softw 23(4):475–489 Maldonado S, Pérez J, Weber R, Labbé M (2014) Feature selection for support vector machines via mixed integer linear programming. Inf Sci 279:163–175 Mallows CL (1973) Some comments on \(C_p\). Technometrics 15(4):661–675 Miller A (2002) Subset selection in regression. Chapman and Hall, Boca Raton Miyashiro R, Takano Y (2015a) Subset selection by Mallows’ \(C_p\): a mixed integer programming approach. Expert Syst Appl 42(1):325–331 Miyashiro R, Takano Y (2015b) Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur J Oper Res 247(3):721–731 Mosier CI (1951) I. Problems and designs of cross-validation. Educ Psychol Meas 11(1):5–11 Naganuma M, Takano Y, Miyashiro R (2019) Feature subset selection for ordered logit model via tangent-plane-based approximation. IEICE Tran Inf Syst E102-D(5), 1046–1053 Okuno T, Takeda A, Kawana A (2018) Hyperparameter learning for bilevel nonsmooth optimization. arXiv preprint arXiv:1806.01520 Park YW, Klabjan D (2017) Subset selection for multiple linear regression via optimization. arXiv preprint arXiv:1701.07920 Pedregosa F (2016) Hyperparameter optimization with approximate gradient. In: Proceedings of the 33rd international conference on machine learning, pp 737–746 Sato T, Takano Y, Miyashiro R, Yoshise A (2016) Feature subset selection for logistic regression via mixed integer optimization. Comput Optim Appl 64(3):865–880 Sato T, Takano Y, Miyashiro R (2017) Piecewise-linear approximation for feature subset selection in a sequential logit model. J Oper Res Soc Jpn 60(1):1–14 Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464 Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494 Sinha A, Malo P, Deb K (2018) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evolut Comput 22(2):276–295 Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 36(2):111–147 Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2017) Best subset selection for eliminating multicollinearity. J Oper Res Soc Jpn 60(3):321–336 Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2019) Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. J Glob Optim 73(2):431–446 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288 Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Mach Learn 102(3):349–391 van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Oxford Wherry R (1931) A new formula for predicting the shrinkage of the coefficient of multiple correlation. Ann Math Stat 2(4):440–457 Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320