Tuning Parameter Selection Based on Blocked $$3\times 2$$ Cross-Validation for High-Dimensional Linear Regression Model

Springer Science and Business Media LLC - Tập 51 - Trang 1007-1029 - 2019

Xingli Yang¹, Yu Wang², Ruibo Wang², Mengmeng Chen¹, Jihong Li²

¹School of Mathematical Sciences, Shanxi University, Taiyuan, People’s Republic of China

²School of Software, Shanxi University, Taiyuan, People’s Republic of China

Tóm tắt

In high-dimensional linear regression, selecting an appropriate tuning parameter is essential for the penalized linear models. From the perspective of the expected prediction error of the model, cross-validation methods are commonly used to select the tuning parameter in machine learning. In this paper, blocked $$3\times 2$$ cross-validation ($$3\times 2$$ BCV) is proposed as the tuning parameter selection method because of its small variance for the prediction error estimation. Under some weaker conditions than leave-$$n_v$$-out cross-validation, the tuning parameter selection method based on $$3\times 2$$ BCV is proved to be consistent for the high-dimensional linear regression model. Furthermore, simulated and real data experiments support the theoretical results and demonstrate that the proposed method works well in several criteria about selecting the true model.

Tài liệu tham khảo

Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–771 Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360 Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York Meinshausen N, Buhlmann P (2006) High-dimensional graphs and variable selection with the LASSO. Ann Stat 34(3):1436–1462 Ng S (2013) Variable selection in predictive regressions. Handb Econ Forecast 2B:753–789 Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58:267–288 Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942 Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320 Zou H, Zhang HH (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37(4):1733–1751 Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723 Schawarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464 Wang H, Li B, Leng C (2009) Shrinkage tuning parameter selection with a diverging number of parameters. J R Stat Soc Ser B 71:671–683 Alpaydin E (1999) Combined 5 $\times $ 2 cv F test for comparing supervised classification learning algorithms. Neural Comput 11(8):1885–1892 Yang Y (2007) Consistency of cross validation for comparing regression procedures. Ann Stat 35:2450–2473 Wang Y, Wang R, Jia H, Li J (2014) Blocked $3\times 2$ cross-validated t-test for comparing supervised classification learning algorithms. Neural Comput 26(1):208–235 Zhang Y, Yang Y (2015) Cross-validation for selecting a model selection procedure. J Econom 187(1):95–112 Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924 Feng Y, Yu Y (2013) Consistent cross-validation for tuning parameter selection in high-dimensional variable selection. In: World statistics congress Rao C, Wu Y (1989) A strongly consistent procedure for model selection in a regression problem. Biometrika 76:369–374 Wang T, Zhu L (2011) Consistent tuning parameter selection in high dimensional sparse linear regression. J Multivar Anal 102:1141–1151 Fan J, Guo S, Hao N (2012) Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Ser B. 74(1):37–65 Shao J (1993) Linear model selection by cross-validation. Stat Assoc 88:486–494 Wang Y, Li J, Li Y (2017) Choosing between two classification learning algorithms based on calibrated balanced 5$\times $ 2 cross-validated F-test. Neural Process Lett 46(1):1–13 Wang R, Wang Y, Li J, Yang X, Yang J (2017) Block-regularized $m \times 2$ cross-validated estimator of the generalization error. Neural Comput 29(2):519–544 Yang Y (2006) Comparing learning methods for classification. Stat Sin 16:635–657 Zhang C, Huang J (2008) The sparsity and bias of the LASSO selection in high dimensional linear regression. Ann Stat 36(4):1567–1594 Buza K (2014) Feedback prediction for blogs. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery. Springer International Publishing, New York, pp 145–152 Lalley SP (2013) Concentration inequalities. http://www.stat.uchicago.edu/~lalley/Courses/386/Concentration.pdf

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA