Why do we still use stepwise modelling in ecology and behaviour?

Journal of Animal Ecology - Tập 75 Số 5 - Trang 1182-1189 - 2006
Mark J. Whittingham1,2, Philip A. Stephens1,2, Richard B. Bradbury3, Robert P. Freckleton4
1Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK
2Division of Biology, School of Biology and Psychology, Ridley Building, University of Newcastle, Newcastle Upon Tyne, NE1 7RU, UK
3Royal Society for the Protection of Birds, The Lodge, Sandy, Bedfordshire SG19 2DL, UK
4Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK

Tóm tắt

Summary

The biases and shortcomings of stepwise multiple regression are well established within the statistical literature. However, an examination of papers published in 2004 by three leading ecological and behavioural journals suggested that the use of this technique remains widespread: of 65 papers in which a multiple regression approach was used, 57% of studies used a stepwise procedure.

The principal drawbacks of stepwise multiple regression include bias in parameter estimation, inconsistencies among model selection algorithms, an inherent (but often overlooked) problem of multiple hypothesis testing, and an inappropriate focus or reliance on a single best model. We discuss each of these issues with examples.

We use a worked example of data on yellowhammer distribution collected over 4 years to highlight the pitfalls of stepwise regression. We show that stepwise regression allows models containing significant predictors to be obtained from each year's data. In spite of the significance of the selected models, they vary substantially between years and suggest patterns that are at odds with those determined by analysing the full, 4‐year data set.

An information theoretic (IT) analysis of the yellowhammer data set illustrates why the varying outcomes of stepwise analyses arise. In particular, the IT approach identifies large numbers of competing models that could describe the data equally well, showing that no one model should be relied upon for inference.

Từ khóa


Tài liệu tham khảo

10.1109/TAC.1974.1100705

10.2307/3803199

10.1046/j.1365-2664.2000.00552.x

10.1007/978-1-4757-2917-7

Burnham K.P., 2002, Model Selection and Multimodel Inference: a Practice Information‐Theoretic Approach

10.17763/haer.48.3.t490261645281841

10.1201/b15238

Cohen J., 1994, The earth is round (P < 0·05), American Psychologist, 49, 997, 10.1037/0003-066X.49.12.997

Cohen J., 1983, Applied Multiple Regression/Correlation Analysis for the Behavioural Sciences

10.1111/j.2044-8317.1992.tb00992.x

Draper N., 1981, Applied Regression Analysis

10.1016/j.tree.2003.11.004

Grafen A., 2002, Modern Statistics for the Life Sciences

10.2193/0022-541X(2005)069[0457:ITIWSC]2.0.CO;2

10.2307/2685338

10.1111/j.0021-8901.2004.00899.x

10.2307/3802789

10.1016/j.tree.2003.10.013

10.1007/978-1-4899-3242-6

10.2307/1267425

10.1890/04-0823

10.1111/j.0021-8901.2004.00903.x

10.1111/j.1365-2664.2005.01002.x

10.1016/S0895-4356(99)00103-1

10.1111/j.1365-2664.2005.01007.x

10.1037/0033-2909.86.1.168

10.1111/j.1523-1739.2003.00614.x