Forward variable selection for sparse ultra-high-dimensional generalized varying coefficient models

Japanese Journal of Statistics and Data Science - Tập 4 - Trang 151-179 - 2020

Toshio Honda¹, Chien-Tong Lin²

¹Graduate School of Economics, Hitotsubashi University, Tokyo, Japan

²Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan

Tóm tắt

In this paper, we propose forward variable selection procedures for feature screening in ultra-high-dimensional generalized varying coefficient models. We employ regression spline to approximate coefficient functions and then maximize the log-likelihood to select an additional relevant covariate sequentially. If we decide we do not significantly improve the log-likelihood any more by selecting any new covariates from our stopping rule, we terminate the forward procedures and give our estimates of relevant covariates. The effect of the size of the current model has been overlooked in stopping rules for sequential procedures for high-dimensional models. Our stopping rule takes into account the size of the current model suitably. Our forward procedures have screening consistency and some other desirable properties under regularity conditions. We also present the results of numerical studies to show their good finite sample performances.

Tài liệu tham khảo

Breheny, P., & Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173–187. Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. New York: Springer. Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771. Chen, J., & Chen, Z. (2012). Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, 22, 555–574. Cheng, M. Y., Honda, T., & Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 1209–1221. Fan, J., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106, 544–557. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 95, 1348–1360. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911. Fan, J., Ma, Y., & Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109, 1270–1284. Fan, J., & Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604. Fan, J., & Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and its Interface, 1, 179–195. Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. Boca Raton: Chapman & Hall/CRC. Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in medicine, 21, 2409–2419. Honda, T., Ing, C. K., & Wu, W. Y. (2019). Adaptively weighted group lasso for semiparametric quantile regression models. Bernoulli, 25, 3311–3338. Ing, C. K., & Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513. Kim, Y., & Jeon, J. J. (2016). Consistent model selection criteria for quadratically supported risks. The Annals of Statistics, 44, 2467–2496. Lee, E. R., Noh, H., & Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109, 216–229. Liu, J., Zhong, W., & Li, R. (2015). A selective overview of feature screening for ultrahigh-dimensional data. Science China Mathematics, 58, 1–22. Luo, S., & Chen, Z. (2014). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109, 1229–1240. Mulligan, G., Mitsiades, C., Bryant, B., Zhan, F., Chng, W. J., Roels, S., et al. (2007). Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood, 109, 3177–3188. Schumaker, L. (2007). Spline functions: basic theory (3rd ed.). Cambridge: Cambridge University Press. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288. van der Vaart, A. W., & Wellner, J. A. (1996). Weak Convergence and empirical processes. New York: Springer. Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 1512–1524. Xia, X., Yang, H., & Li, J. (2016). Feature screening for generalized varying coefficient models with application to dichotomous responses. Computational Statistics and Data Analysis, 102, 85–97. Yang, G., Yang, S., & Li, R. (2020). Feature screening in ultrahigh dimensional generalized varying-coefficient models. Statistica Sinica, 30, 1049–1067. Zheng, Q., Hong, H. G., & Li, Y. (2020). Building generalized linear models with ultrahigh dimensional features: a sequentially conditional approach. Biometrics, 76, 47–60. Zheng, Q., Peng, L., & He, X. (2015). Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics, 43, 2225–2258.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA