Semi-automated simultaneous predictor selection for regression-SARIMA models

Aaron Lowther1, Paul Fearnhead1, Matthew A. Nunes2, Kasper Løvborg Jensen3
1Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, UK
2School of Mathematics, University of Bath, Bath BA2 7AY, UK
3BT Applied Research, BT Plc, London, EC1A 7AJ, UK

Tóm tắt

AbstractDeciding which predictors to use plays an integral role in deriving statistical models in a wide range of applications. Motivated by the challenges of predicting events across a telecommunications network, we propose a semi-automated, joint model-fitting and predictor selection procedure for linear regression models. Our approach can model and account for serial correlation in the regression residuals, produces sparse and interpretable models and can be used to jointly select models for a group of related responses. This is achieved through fitting linear models under constraints on the number of nonzero coefficients using a generalisation of a recently developed mixed integer quadratic optimisation approach. The resultant models from our approach achieve better predictive performance on the motivating telecommunications data than methods currently used by industry.

Từ khóa


Tài liệu tham khảo

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) 2nd International Symposium on Information Theory, pp. 267–281. Budapest Akademiai Kiado (1973)

Beale, E.M.L.: Note on procedures for variable selection in multiple regression. Technometrics 12(4), 909–914 (1970)

Berk, K.N.: Comparing subset regression procedures. Technometrics 20(1), 1–6 (1978)

Bertsimas, D., King, A.: OR forum-an algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2016)

Bertsimas, D., King, A., Muzumder, R.: Best subset selection via a modern optimisation lens. Ann. Stat. 44, 813–852 (2016)

Breiman, L., Friedman, J.H.: Predicting multivariate responses in a multiple linear regression. J. R. Stat. Soc. B 59(1), 3–54 (1997)

Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting, 2nd edn. Springer, Berlin (2002)

Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

Cochrane, D., Orcutt, G.H.: Application of least squares regression to relationships containing auto-correlated error terms. J. Am. Stat. Assoc. 44(245), 32–61 (1949)

Duong, L., Cohn, T., Bird, S., Cook, P.: Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers, pp. 845–850), Association for Computational Linguistics, Beijing (2015). https://doi.org/10.3115/v1/P15-2139, https://www.aclweb.org/anthology/P15-2139

Gurobi Optimization, L.: Gurobi optimizer reference manual (2019). http://www.gurobi.com

Hastie, T., Tibshirani, R.R.J. Tibshirani: Extended comparisons of best subset selection, forward stepwise selection, and the lasso (2017). arXiv Preprint arXiv:1707.08692

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer Series in Statistics, 2nd edn. Springer, New York (2008)

Hazimeh, H., Mazumder, R.: Fast best subset selection: coordinate descent and local combinatorial optimization algorithms (2018). arXiv preprint arXiv:1803.01454

Hocking, R.R.: A biometrics invited paper: the analysis and selection of variables in linear regression. Biometrics 32(1), 1–49 (1976)

Hoerl, E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27(3), 1–22 (2008)

Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5, 248–264 (1975)

Jordan, M.I., Mitchel, T.M.: Machine learning: trends, prespectives and prospects. Science 349(6245), 255–260 (2015)

Katal, A., Wazid, M., Goudar, R.H.: Big data: Issues, challenges, tools and good practices. In: Parashar, M., Zomaya, A., Chen, J., Cao, J.N., Bouvry, P., Prasad, S. (eds.) 2013 Sixth International Conference on Contemporary Computing (IC3). Jaypee Institute of Information Technology, IEEE (2013)

Kronqvist, J., Bernal, D.E., Lundell, A., Grossmann, I.E.: A review and comparison of solvers for convex MINLP. Optim. Eng. 20, 397–455 (2019)

Lowther, A.P.: Multivariate response predictor selection methods: with applications to telecommunications time series data. PhD thesis, Department of Mathematics and Statistics, Lancaster University, UK (2019). https://eprints.lancs.ac.uk/id/eprint/141405/1/2019lowtherphd.pdf

Mantel, N.: Why stepdown procedures in variable selection. Technometrics 12(3), 621–625 (1970)

Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the SNR is low (2017). arXiv preprint arXiv:1708.03288

Miller, A.J.: Subset Selections in Regression. Monographs on Statistics and Applied Probability, vol. 95, 2nd edn. Chapman and Hall CRC, Boca Raton (2002)

Proost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Big Data 1(1), 52–59 (2013)

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2019). https://www.R-project.org/

Rao, C.R., Toutenburg, H.: Linear Models: Least Squares and Alternatives, 2nd edn. Springer, Berlin (1999)

Rawlings, J.O., Pantula, S.G., Dickey, D.A.: Applied Regression Analysis: A Research Tool, 2nd edn. Springer, Berlin (1998)

Reinsel, G.C., Velu, R.: Multivariate Reduced-Rank Regression: Theory and Applications. Lecture Notes in Statistics, vol. 136. Springer, Berlin (2013)

Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

Similia, T., Tikka, J.: Input selection and shrinkage in multiresponse linear regression. Comput. Stat. Data Anal. 52, 406–422 (2007)

Simon, N., Friedman, J., Hastie, T.: A blockwise descent algorithm for group-penalized multiresponse and multinomial regression (2013). arXiv Preprint arXiv:1311.6529v1

Soltysik, R.C., Yarnold, P.R.: Two-group multiODA: a mixed-integer linear programming solution with bounded M. Optim. Data Anal. 1, 30–37 (2010)

Srivastava, M.S., Solanky, T.K.S.: Predicting multivariate response in linear regression model. Commun. Stat. Simul. Comput. 32(2), 389–409 (2003)

Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. B 36, 111–147 (1974)

Stroud, J.R., Müller, P., Sansó, B.: Dynamic models for spatiotemporal data. J. R. Stat. Soc. B 63(4), 673–689 (2001)

Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58(1), 267–288 (1996)

Turlach, B.A., Venables, W.N., Wright, S.J.: Simultaneous variable selection. Technometrics 47(3), 349–363 (2005). https://doi.org/10.1198/004017005000000139

Xie, W., Deng, X.: The CCP selector: scalable algorithms for sparse ridge regression from chance-constrained programming (2018). arXiv preprint arXiv:1806.03756

Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68(1), 49–67 (2006)

Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

Zou, H., Hastie, T.: elasticnet: elastic-net for sparse estimation and sparse PCA. R package version 1.1.1 (2018). https://CRAN.R-project.org/package=elasticnet