A demonstration of modeling count data with an application to physical activity
Tóm tắt
Counting outcomes such as days of physical activity or servings of fruits and vegetables often have distributions that are highly skewed toward the right with a preponderance of zeros, posing analytical challenges. This paper demonstrates how such outcomes may be analyzed with several modifications to Poisson regression. Five regression models 1) Poisson, 2) overdispersed Poisson, 3) negative binomial, 4) zero-inflated Poisson (ZIP), and 5) zero-inflated negative binomial (ZINB) are fitted to data assessing predictors of vigorous physical activity (VPA) among Latina women. The models are described, and analytical and graphical approaches are discussed to aid in model selection. Poisson regression provided a poor fit where 82% of the subjects reported no days of VPA. The fit improved considerably with the negative binomial and ZIP models. There was little difference in fit between the ZIP and ZINB models. Overall, the ZIP model fit best. No days of VPA were associated with poorer self-reported health and less assimilation to Anglo culture, and marginally associated with increasing BMI. The intensity portion of the model suggested that increasing days of VPA were associated with more education, and marginally associated with increasing age. These underutilized models provide useful approaches for handling counting outcomes.
Tài liệu tham khảo
Long JS: Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications 1997.
Cameron AC, Trivedi PK: Regression Analysis of Count Data. Cambridge, UK: Cambridge University Press 1998.
Kremers HM, Reinalda MS, Crowson CS, Zinsmeister AR, Hunder GG, Gabriel SE: Use of physician services in a population-based cohort of patients with polymyalgia rheumatica over the course of their disease. Arthritis & Rheumatism 2005, 53:395–403.
Cheung YB: Zero-inflated models for regression analysis of count data: a study of growth and development. Statistics in Medicine 2002, 21:1461–69.
Wong EL, Roddy RE, Tucker H, Tamoufe U, Ryan K, Ngampoua F: Use of male condoms during and after randomized, controlled trial participation in Cameroon. Sexually Transmitted Diseases 2005, 32:300–307.
Bulsara MK, Holman CDJ, Davis EA, Jones TW: Evaluating risk factors associated with severe hypoglycaemia in epidemiology studies- what method should we use? Diabetic Medicine 2004, 21:914–919.
Lewsey JD, Thomson WM: The utility of the zero-inflated Poisson and zero-inflated negative binomial models: a case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status. Community Dentistry and Oral Epidemiology 2004, 32:183–9.
Lee AH, Stevenson MR, Wang K, Yau KKW: Modeling young driver motor vehicle crashes: data with extra zeros. Accident Analysis and Prevention 2002, 34:515–21.
Qin X, Ivan JN, Ravishanker N: Selecting exposure measures in crash rate prediction for two-lane highway segments. Accident Analysis & Prevention 2004, 36:183–91.
Lord D, Washington SP, Ivan JN: Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention 2005, 37:35–46.
Lachenbruch PA: Analysis of data with clumping at zero. Biometrische Zeitschrift 1976, 18:351–6.
Lachenbruch PA: Utility of logistic regression in epidemiologic studies of the elderly. The Epidemiologic Study of the Elderly (Edited by: Wallace RB, Woolson RF). Oxford: Oxford University Press 1992, 371–81.
Chang BH, Pocock S: Analyzing data with clumping at zero: an example demonstration. Journal of Clinical Epidemiology 2000, 53:1036–43.
Kegler SR: Reporting incidence from a surveillance system with an operational case definition of unknown predictive value positive. Epidemiologic Perspectives & Innovation 2005, 2:7.
Elder JP, Ayala GX, Campbell NR, Slymen DJ, Lopez-Madurga ET, Engelberg M: Interpersonal and print nutrition communication for a Spanish-dominant Latino population: Secretos de la Buena Vista. Health Psychology 2005, 24:49–57.
Mayer EJ, Alderman BW, Regensteiner JG, Marshall JA, Haskell WL, Baxter J, Hamman RF: Physical-activity-assessment measures compared in a biethnic rural population: the San Luis Valley Diabetes Study. Am J Clin Nutr 1991,53(4):812–20.
Brownson RC, Eyler AA, King AC, Brown DR, Shyu Y-L, Sallis JF: Patterns and correlates of physical activity among US women 40 years and older. Am J Public Health 2000,90(2):264–70.
Avila P, Hovell MF: Physical activity training for weight loss in Latinas: a controlled trial. Int J Obesity & Related Metabolic Disorders 1994,18(7):476–82.
Berg JA, Cromwell SL, Arnett M: Physical activity: perspectives of Mexican American and Anglo American midlife women. Health Care for Women Int 2002,23(8):894–904.
Castro CM, Sallis JF, Hickmann SA, Lee RE, Chen AH: A prospective study of psychosocial correlates of physical activity for ethnic minority women. Psychology and Health 1999, 14:277–293.
Bull FC, Eyler AA, King AC, Brownson RC: Stage of readiness to exercise in ethnically diverse women: a U.S. survey. Med & Sci in Sports & Exercise 2001,33(7):1147–56.
Crespo CJ, Smit E, Andersen RE, Carter-Pokras O, Ainsworth BE: Race/ethnicity, social class and their relation to physical inactivity during leisure time: results from the Third National Health and Nutrition Examination Survey, 1988–1994. Am J Prev Med 2000,18(1):46–53.
Crespo CJ, Smit E, Carter-Pokras O, Andersen R: Acculturation and leisure-time physical inactivity in Mexican American adults: results from NHANES III, 1988–1994. Am J Public Health 2001,91(8):1254–7.
Evenson KR, Sarmiento OL, Macon ML, Tawney KW, Ammerman AS: Environmental, policy, and cultural factors related to physical activity among Latina immigrants. Women & Health 2002,36(2):43–57.
Fulton-Kehoe D, Hamman RF, Baxter J, Marshall J: A case-control study of physical activity and non-insulin dependent diabetes mellitus (NIDDM): the San Luis Valley Diabetes Study. Annals of Epidemiology 2001,11(5):320–7.
Ransdell LB, Wells CL: Physical activity in urban white, African-American, and Mexican-American women. Med & Sci in Sports & Exercise 1998,30(11):1608–15.
Garrow JS, Webster J: Quetelet's index (w/h 2 ) as a measure of fatness. International Journal of Obesity 1985, 9:147–53.
Cuellar I, Arnold B, Maldonado R: Acculturation rating scale for Mexican Americans- II: a revision of the original ARSMA scale. Hispanic Journal of Behavioral Sciences 1995, 17:275–304.
Gardner W, Mulvey EP, Shaw EC: Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin 1995, 118:392–404.
Stokes ME, Davis CS, Koch GG: Categorical Data Analysis Using the SAS System 2 Edition Cary, NC: SAS Institute, Inc 2000.
van den Brock J: A score test for zero inflation in a Poisson distribution. Biometrics 1995, 51:738–43.
Ridout M, Hinde J, Demetrio CGB: A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics 2001, 57:219–223.
Lambert D: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34:1–14.
Akaike H: A new look at the statistical model identification. IEEE Transactions on Automatic Control 1974, 19:716–723.
SAS Institute Inc.: SAS/STAT User's Guide, Version 8 Cary, NC: SAS Institute Inc 1999.
StataCorp: Stata Statistical Software, Release 8.0 College Station, Texas: StataCorp 2003.
Econometric Software, Inc: LIMDEP version 8.0 Plainview, N.Y.: Econometric Software, Inc 2002.
Evenson KR, Sarmiento OL, Tawney KW, Macon ML, Ammerman AS: Personal, social, and environmental correlates of physical activity in North Carolina Latina immigrants. American Journal of Preventive Medicine 2003,25(3S):77–85.
Eyler AA, Matson-Koffman D, Young DR, Wilcox S, Wilbur J, Thompson JL, Sanderson B, Evenson KR: Quantitative study of correlates of physical activity in women from diverse racial/ethnic groups: The Women's Cardiovascular Health Network Project – summary and conclusions. American Journal of Preventive Medicine 2003,25(3Si):93–103.
Hovell M, Sallis J, Hofstetter R, Barrington E, Hackley M, Elder J, Castro F, Kilbourne K: Identification of correlates of physical activity among Latino adults. Journal of Community Health 1991,16(1):23–36.
Cantero PJ, Richardson JL, Baezconde-Garbanati L, Marks G: The association between acculturation and health practices among middle-aged and elderly Latinas. Ethnicity and Disease 1999,9(2):166–80.
Yau KKW, Lee AH: Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine 2001, 20:2907–20.
Hur K, Hedeker D, Henderson W, Khuri S, Daley J: Modeling clustered count data with excess zeros in health care outcomes research. Health Services and Outcomes Research Methodology 2002, 3:5–20.