Forward variable selection for sparse ultra-high-dimensional generalized varying coefficient models

Japanese Journal of Statistics and Data Science - Tập 4 - Trang 151-179 - 2020
Toshio Honda1, Chien-Tong Lin2
1Graduate School of Economics, Hitotsubashi University, Tokyo, Japan
2Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan

Tóm tắt

In this paper, we propose forward variable selection procedures for feature screening in ultra-high-dimensional generalized varying coefficient models. We employ regression spline to approximate coefficient functions and then maximize the log-likelihood to select an additional relevant covariate sequentially. If we decide we do not significantly improve the log-likelihood any more by selecting any new covariates from our stopping rule, we terminate the forward procedures and give our estimates of relevant covariates. The effect of the size of the current model has been overlooked in stopping rules for sequential procedures for high-dimensional models. Our stopping rule takes into account the size of the current model suitably. Our forward procedures have screening consistency and some other desirable properties under regularity conditions. We also present the results of numerical studies to show their good finite sample performances.

Tài liệu tham khảo

Breheny, P., & Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173–187. Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. New York: Springer. Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771. Chen, J., & Chen, Z. (2012). Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, 22, 555–574. Cheng, M. Y., Honda, T., & Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 1209–1221. Fan, J., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106, 544–557. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 95, 1348–1360. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911. Fan, J., Ma, Y., & Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109, 1270–1284. Fan, J., & Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604. Fan, J., & Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and its Interface, 1, 179–195. Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. Boca Raton: Chapman & Hall/CRC. Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in medicine, 21, 2409–2419. Honda, T., Ing, C. K., & Wu, W. Y. (2019). Adaptively weighted group lasso for semiparametric quantile regression models. Bernoulli, 25, 3311–3338. Ing, C. K., & Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513. Kim, Y., & Jeon, J. J. (2016). Consistent model selection criteria for quadratically supported risks. The Annals of Statistics, 44, 2467–2496. Lee, E. R., Noh, H., & Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109, 216–229. Liu, J., Zhong, W., & Li, R. (2015). A selective overview of feature screening for ultrahigh-dimensional data. Science China Mathematics, 58, 1–22. Luo, S., & Chen, Z. (2014). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109, 1229–1240. Mulligan, G., Mitsiades, C., Bryant, B., Zhan, F., Chng, W. J., Roels, S., et al. (2007). Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood, 109, 3177–3188. Schumaker, L. (2007). Spline functions: basic theory (3rd ed.). Cambridge: Cambridge University Press. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288. van der Vaart, A. W., & Wellner, J. A. (1996). Weak Convergence and empirical processes. New York: Springer. Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 1512–1524. Xia, X., Yang, H., & Li, J. (2016). Feature screening for generalized varying coefficient models with application to dichotomous responses. Computational Statistics and Data Analysis, 102, 85–97. Yang, G., Yang, S., & Li, R. (2020). Feature screening in ultrahigh dimensional generalized varying-coefficient models. Statistica Sinica, 30, 1049–1067. Zheng, Q., Hong, H. G., & Li, Y. (2020). Building generalized linear models with ultrahigh dimensional features: a sequentially conditional approach. Biometrics, 76, 47–60. Zheng, Q., Peng, L., & He, X. (2015). Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics, 43, 2225–2258.