Model Selection and Estimation in Regression with Grouped Variables

Ming Yuan1, Yi Lin2
1Georgia Institute of Technology, Atlanta, (USA)
2University of Wisconsin-Madison, USA

Tóm tắt

SummaryWe consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

Từ khóa


Tài liệu tham khảo

Bakin, 1999, Adaptive regression and model selection in data mining problems

Breiman, 1995, Better subset regression using the nonnegative garrote, Technometrics, 37, 373, 10.1080/00401706.1995.10484371

Efron, 2004, Least angle regression, Ann. Statist., 32, 407, 10.1214/009053604000000067

Fan, 2001, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Statist. Ass., 96, 1348, 10.1198/016214501753382273

Foster, 1994, The risk inflation criterion for multiple regression, Ann. Statist., 22, 1947, 10.1214/aos/1176325766

Fu, 1999, Penalized regressions: the bridge versus the lasso, J. Comput. Graph. Statist., 7, 397

George, 2000, Calibration and empirical Bayes variable selection, Biometrika, 87, 731, 10.1093/biomet/87.4.731

George, 1993, Variable selection via Gibbs sampling, J. Am. Statist. Ass., 88, 881, 10.1080/01621459.1993.10476353

Hosmer, 1989, Applied Logistic Regression

Lin, 2003, Technical Report 1072

Rosset, 2004, Technical Report

Shen, 2002, Adaptive model selection, J. Am. Statist. Ass., 97, 210, 10.1198/016214502753479356

Tibshirani, 1996, Regression shrinkage and selection via the lasso, J. R. Statist. Soc. B, 58, 267

Yuan, 2005, Statistics Discussion Paper 2005-25