Enmsp: an elastic-net multi-step screening procedure for high-dimensional regression

Statistics and Computing - Tập 34 - Trang 1-16 - 2024
Yushan Xue1, Jie Ren2, Bin Yang3
1School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China
2Cinda Securities Co., Ltd., Beijing, China
3Research Center for International Inspection and Quarantine Standards and Technical Regulations, Beijing, China

Tóm tắt

To improve the estimation efficiency of high-dimensional regression problems, penalized regularization is routinely used. However, accurately estimating the model remains challenging, particularly in the presence of correlated effects, wherein irrelevant covariates exhibit strong correlation with relevant ones. This situation, referred to as correlated data, poses additional complexities for model estimation. In this paper, we propose the elastic-net multi-step screening procedure (EnMSP), an iterative algorithm designed to recover sparse linear models in the context of correlated data. EnMSP uses a small repeated penalty strategy to identify truly relevant covariates in a few iterations. Specifically, in each iteration, EnMSP enhances the adaptive lasso method by adding a weighted $$l_2$$ penalty, which improves the selection of relevant covariates. The method is shown to select the true model and achieve the $$l_2$$ -norm error bound under certain conditions. The effectiveness of EnMSP is demonstrated through numerical comparisons and applications in financial data.

Tài liệu tham khảo

Bühlmann, P.: Statistical significance in high-dimensional linear models. Bernoulli 19(4), 1212–1242 (2013) Bühlmann, P., Kalisch, M., Maathuis, M.H.: Variable selection in high-dimensional linear models: partially faithful distributions and the pc-simple algorithm. Biometrika 97(2), 261–278 (2010) Candes, E., Tao, T.: The dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007) Cho, H., Fryzlewicz, P.: High dimensional variable selection via tilting. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 74(3), 593–622 (2012) Dai, L., Chen, K., Sun, Z., Liu, Z., Li, G.: Broken adaptive ridge regression and its asymptotic properties. J. Multivar. Anal. 168, 334–351 (2018) Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001) Fan, J.Q., Lv, J.C.: Nonconcave penalized likelihood with np-dimensionality. IEEE Trans. Inf. Theory 57(8), 5467–5484 (2011) Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42(3), 819–849 (2014) Hilafu, H., Yin, X.: Sufficient dimension reduction and variable selection for large-p-small-n data with highly correlated predictors. J. Comput. Graph. Stat. 26(1), 26–34 (2017) Javanmard, A., Montanari, A.: Hypothesis testing in high-dimensional regression under the gaussian random design model: asymptotic theory. IEEE Trans. Inf. Theory 60(10), 6522–6554 (2014) Jin, J., Zhang, C.-H., Zhang, Q.: Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15(1), 2723–2772 (2014) Kim, S.-J., Koh, K., Boyd, S., Gorinevsky, D.: \(l_1\) trend filtering. SIAM Rev. 51(2), 339–360 (2009) Maier, A., Rodríguez-Salas, D.: Fast and robust selection of highly-correlated features in regression problems. In 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), pp. 482–485. IEEE (2017) Meade, N., Salkin, G.R.: Index funds–construction and performance measurement. J. Oper. Res. Soc. 40(10), 871–879 (1989) Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37(1), 246–270 (2009) Negahban, S., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of \(m\)-estimators with decomposable regularizers. Stat. Sci. 27(4), 1348–1356 (2012) Pearson, K.: Note on regression and inheritance in the case of two parents. Proc. Roy. Soc. Lond. 58(347–352), 240–242 (1895) Raskutti, G., Wainwright, M.J., Yu, B.: Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res. 11, 2241–2259 (2010) Ročková, V., George, E.I.: The spike-and-slab lasso. J. Am. Stat. Assoc. 113(521), 431–444 (2018) Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996) Tibshirani, R.: The lasso method for variable selection in the cox model. Stat. Med. 16(4), 385–395 (1997) Yang, Y., Zhu, J., George, E.I.: MuSP: a multi-step screening procedure for sparse recovery. Stat 10, e352 (2020) Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B 68(1), 49–67 (2006) Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010) Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010) Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11, 1081–1107 (2010) Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006) Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006) Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005) Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509–1533 (2008) Zou, H., Zhang, H.L.: On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 37(4), 1733–1751 (2009)