Modeling and variable selection in epidemiologic analysis.
Tóm tắt
This paper provides an overview of problems in multivariate modeling of epidemiologic data, and examines some proposed solutions. Special attention is given to the task of model selection, which involves selection of the model form, selection of the variables to enter the model, and selection of the form of these variables in the model. Several conclusions are drawn, among them: a) model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests; b) variable-selection algorithms in current packaged programs, such as conventional stepwise regression, can easily lead to invalid estimates and tests of effect; and c) variable selection is better approached by direct estimation of the degree of confounding produced by each variable than by significance-testing algorithms. As a general rule, before using a model to estimate effects, one should evaluate the assumptions implied by the model against both the data and prior information.
Từ khóa
Tài liệu tham khảo
Breslow NE, Day NE: Statistical Methods in Cancer Research. I: The Analysis of Case-Control Studies. Lyon: IARC, 1980.
Kleinbaum DG, Kupper LL, Morgenstern H: Epidemiologic Research: Principles and Quantitative Methods. Belmont, CA: Lifetime Learning Publications, 1982.
Rothman KJ: Modern Epidemiology. Boston: Little, Brown, 1986.
Checkoway H, Pearce N, Crawford-Brown D: Research Methods in Occupational Epidemiology. New York: Oxford, 1989.
Cox DR, Oakes D: Analysis of Survival Data. New York: Chapman and Hall, 1984.
Breslow NE, Day NE: The Design and Analysis of Cohort Studies. New York: Oxford, 1988.
Thomas DC, 1981, Biometrics, 29, 276
White H: Estimation, Inference, and Specification Analysis. New York: Cambridge University Press, 1989.
Cook RD, Weisberg S: Residuals and Influence in Regression. New York: Chapman and Hall, 1986.
Lustbader ED: Relative risk regression diagnostics. In: Moolgavkar SH, Prentice RL (eds): Modern Statistical Methods in Chronic Disease Epidemiology. New York: Wiley, 1986.
Doll R, 1978, Cancer Res, 38, 3573
Bishop YMM, Fienberg SE, Holland PW: Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press, 1975.
Numerical Algorithms Group: The Generalized Linear Interactive Modeling System (GLIMn). London: Royal Statistical Society, 1987.
Statistics and Epidemiology Research Corporation. EGRET Statistical Software. Seattle: SERC Inc, 1988.
Dixon WJ (ed): BMDP Statistical Software. Berkeley, CA: University of California Press, 1985.
SAS Institute Inc: SAS Guide for Personal Computers. Gary, NC: SAS Institute Inc, 1985.
Whittemore AS, 1978, J R Stat Soc B, 40, 328
Ducharme GR, 1986, J R Statist Soc B, 48, 197
Robins JM: The statistical foundations of confounding in epidemiology. Technical Report No. 2. Boston, MA: Occupational Health Program, Harvard School of Public Health, 1983.
Leamer EE: Specification Searches. New York: Wiley, 1978.
Robins JM, Greenland S: Estimability and estimation of excess and etiologic fractions. Stat Med 1989; in press.
Bickel PJ, Doksum KA: Mathematical Statistics. Oakland: Holden-Day, 1977.