Selection of optimal regression models via cross‐validation

Journal of Chemometrics - Tập 2 Số 1 - Trang 39-48 - 1988
David W. Osten1
13M, 3M Center, Building 518-1, St Paul, MN 55144-1000, U.S.A

Tóm tắt

AbstractA general problem arising in the development of regression models is the selection of the optimal model. Whenever a feature selection procedure, such as step forward, backward elimination, best subset or all possible combinations, or when a data compression approach, such as principal components or partial least‐squares regression, is used, the question of how many regression terms to include in the final model must be addressed.This work describes the evaluation of four different criteria for selection of the optimal predictive regression model using cross‐validation. The results obtained in this work illustrate the problems which can arise in the analysis of small or inadequately sampled data sets. The common approach, selecting the model which yields the absolute minimum in the predictive residual error sum of squares (PRESS), was found to have particularly poor statistical properties. A very simple change to a criterion based on the first local minimum in PRESS will provide a significant improvement in the cross‐validation result. A criterion based on testing the significance of incremental changes in PRESS with an F‐test may provide more robust performance than the local minimum in PRESS method.

Từ khóa


Tài liệu tham khảo

Draper N. R., 1981, Applied Regression Analysis

10.2307/2529336

Stone M. J., 1973, Roy. Statist, Soc. B, 36, 111

10.1080/00401706.1977.10489581

10.1080/00401706.1978.10489693

Eastment H. T., 1982, Technometrics, 24, 73, 10.1080/00401706.1982.10487712

10.1021/ci00041a602

10.1038/scientificamerican0583-116