Modeling and variable selection in epidemiologic analysis.

American journal of public health - Tập 79 Số 3 - Trang 340-349 - 1989
Sander Greenland1
1Division of Epidemiology, University of California, School of Public Health, Los Angeles 90024.

Tóm tắt

This paper provides an overview of problems in multivariate modeling of epidemiologic data, and examines some proposed solutions. Special attention is given to the task of model selection, which involves selection of the model form, selection of the variables to enter the model, and selection of the form of these variables in the model. Several conclusions are drawn, among them: a) model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests; b) variable-selection algorithms in current packaged programs, such as conventional stepwise regression, can easily lead to invalid estimates and tests of effect; and c) variable selection is better approached by direct estimation of the degree of confounding produced by each variable than by significance-testing algorithms. As a general rule, before using a model to estimate effects, one should evaluate the assumptions implied by the model against both the data and prior information.

Từ khóa


Tài liệu tham khảo

10.1093/oxfordjournals.aje.a114640

10.1016/0021-9681(74)90078-2

10.1093/oxfordjournals.aje.a112339

10.1002/1097-0142(197704)39:4+<1771::AID-CNCR2820390803>3.0.CO;2-2

10.1093/ije/7.4.373

10.1093/ije/9.4.361

Breslow NE, Day NE: Statistical Methods in Cancer Research. I: The Analysis of Case-Control Studies. Lyon: IARC, 1980.

Kleinbaum DG, Kupper LL, Morgenstern H: Epidemiologic Research: Principles and Quantitative Methods. Belmont, CA: Lifetime Learning Publications, 1982.

Rothman KJ: Modern Epidemiology. Boston: Little, Brown, 1986.

10.1093/oxfordjournals.aje.a121470

10.1002/sim.4780010304

10.2307/1402171

Checkoway H, Pearce N, Crawford-Brown D: Research Methods in Occupational Epidemiology. New York: Oxford, 1989.

Cox DR, Oakes D: Analysis of Survival Data. New York: Chapman and Hall, 1984.

Breslow NE, Day NE: The Design and Analysis of Cohort Studies. New York: Oxford, 1988.

Thomas DC, 1981, Biometrics, 29, 276

10.1093/oxfordjournals.aje.a113267

10.1093/oxfordjournals.aje.a114074

10.1093/oxfordjournals.aje.a114733

White H: Estimation, Inference, and Specification Analysis. New York: Cambridge University Press, 1989.

10.1002/sim.4780070126

Cook RD, Weisberg S: Residuals and Influence in Regression. New York: Chapman and Hall, 1986.

10.1002/sim.4780020219

McCullagh P, Nelder JA: Generalized Linear Models. New York: Chapman and Hall, 1983.

10.2307/2530907

Lustbader ED: Relative risk regression diagnostics. In: Moolgavkar SH, Prentice RL (eds): Modern Statistical Methods in Chronic Disease Epidemiology. New York: Wiley, 1986.

10.1214/ss/1177013604

Doll R, 1978, Cancer Res, 38, 3573

Bishop YMM, Fienberg SE, Holland PW: Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press, 1975.

Numerical Algorithms Group: The Generalized Linear Interactive Modeling System (GLIMn). London: Royal Statistical Society, 1987.

Statistics and Epidemiology Research Corporation. EGRET Statistical Software. Seattle: SERC Inc, 1988.

10.2307/2531904

10.1016/0270-0255(86)90088-6

Dixon WJ (ed): BMDP Statistical Software. Berkeley, CA: University of California Press, 1985.

SAS Institute Inc: SAS Guide for Personal Computers. Gary, NC: SAS Institute Inc, 1985.

10.1093/oxfordjournals.aje.a114254

10.1093/oxfordjournals.aje.a115101

Whittemore AS, 1978, J R Stat Soc B, 40, 328

Ducharme GR, 1986, J R Statist Soc B, 48, 197

10.2307/2347308

Greenland S: Cautions in the use of preliminary-test estimators. Stat Med 1989; in press.

10.1002/sim.4780050302

Robins JM: The statistical foundations of confounding in epidemiology. Technical Report No. 2. Boston, MA: Occupational Health Program, Harvard School of Public Health, 1983.

10.1093/oxfordjournals.aje.a113225

Leamer EE: Specification Searches. New York: Wiley, 1978.

10.1214/aos/1176344785

10.1080/01621459.1987.10478519

10.2105/AJPH.76.5.559

Fuller WA: Measurement Error Models. New York: Wiley, 1987.

Robins JM, Greenland S: Estimability and estimation of excess and etiologic fractions. Stat Med 1989; in press.

10.1093/ije/10.4.383

10.5271/sjweh.1945

10.1016/0021-9681(87)90106-8

10.1016/0021-9681(81)90004-7

10.2307/2530873

10.2307/2531813

Bickel PJ, Doksum KA: Mathematical Statistics. Oakland: Holden-Day, 1977.

10.1080/01621459.1987.10478547

10.1093/oxfordjournals.aje.a114532