Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

James Hibbert1, Angela D. Liese1, Andrew Lawson2, Dwayne E. Porter3, Robin Puett4, Debra Standiford5, Lenna Liu6, Dana Dabelea7
1Department of Epidemiology and Biostatistics and Center for Research in Nutrition and Health Disparities, Arnold School of Public Health, University of South Carolina, Columbia, USA
2Medical University of South Carolina College of Medicine, Charleston, USA
3Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, USA
4South Carolina Cancer Prevention and Control Program, University of South Carolina, Columbia, USA
5Children's Hospital Medical Center, Cincinnati, USA
6University of Washington Child Health Institute, Seattle, USA
7University of Colorado School of Public Health, Denver, USA

Tóm tắt

Abstract Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution). Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003). Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims.

Từ khóa


Tài liệu tham khảo

Snow J: On the Mode of Communication of Cholera. 1855, London: Churchill

Cromley EK, McLafferty SL: GIS and Public Health. 2002, New York: Guilford Press

Gatrell A: Geographies of Health. 2002, Malden, MA: Blackwell

Lawson AB: Statistical Methods in Spatial Epidemiology. 2006, New York: Wiley, 2

Zimmerman DL: Statistical methods for incompletely and incorrectly geocoded cancer data. Geocoding Health Data: The Use of Geographic Codes in Cancer Prevention and Control, Research and Practice. Edited by: Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL. 2007, Boca Raton, Florida: CRC Press

Bonner MR, Daikwon H, Nie J, Rogerson P, Vena JE, Freudenheim JL: Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology. 2003, 14: 408-412.

Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman D: Geocoding in cancer research: a review. Am J Prev Med. 2006, 30: S16-S24. 10.1016/j.amepre.2005.09.011.

Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL: Geocoding Health Data: The Use of Geographic Codes in Cancer Prevention and Control. 2007, Boca Raton, FL: CRC Press

Krieger N, Waterman P, Chen JT, Soobader MJ, Subramanian SV, Carson R: Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas--the Public Health Disparities Geocoding Project. Am J Public Health. 2002, 92: 1100-1102. 10.2105/AJPH.92.7.1100.

Mohai P, Saha R: Reassessing Racial and Socioeconomic Disparities in Environmental Justice Research. Demography. 2006, 43: 2-10.1353/dem.2006.0017.

Kearney G, Kiros G: A spatial evaluation of socio demographics surrounding National Priorities List sites in Florida using a distance-based approach. International Journal of Health Geographics. 2009, 8: 33-10.1186/1476-072X-8-33.

Voss P, Long D, Hammer R: When census geography doesn't work: Using ancillary information to improve the spatial interpolation of demographic data. 1999, Center for Demography and Ecology, University of Wisconsin, Madison

Truelove M: Measurement of spatial equity. Environment and Planning C: Government and Policy. 1993, 11: 1-10.1068/c110019.

Saporito S, Chavers JM, Nixon LC, McQuiddy MR: From here to there: Methods of allocating data between census geography and socially meaningful areas. Social Science Research. 2007, 36: 3-10.1016/j.ssresearch.2006.05.004.

Klassen AC, Curriero F, Kulldorff M, Alberg AJ, Platz EA, Neloms ST: Missing stage and grade in Maryland prostate cancer surveillance data, 1992-1997. Am J Prev Med. 2006, 30: S77-S87. 10.1016/j.amepre.2005.09.010.

Sheehan JT, DeChello LM, Kulldorff M, Gregorio DI, Gershman S, Mroszczyk M: The geographic distribution of breast cancer incidence in Massachusetts 1988 to adjusted for covariates. International Journal of Health Geographics. 2004, 3: 17-10.1186/1476-072X-3-17.

Henry KA, Boscoe FP: Estimating the accuracy of geographical imputation. International Journal of Health Geographics. 2008, 7: 3-10.1186/1476-072X-7-3.

SEARCH Study Group: SEARCH for Diabetes in Youth: a multicenter study of the prevalence, incidence and classification of diabetes mellitus in youth. Control Clin Trials. 2004, 25: 458-471. 10.1016/j.cct.2004.08.002.

ArcGIS 9.3. 2008, Redlands, CA: Environmental Systems Research Institute (ESRI)

US Census Bureau: Census 2000 ZIP Code Tabulation Areas Technical Documentation.

Grubesic TH, Matisziw TC: On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological data. Int J Health Geogr. 2006, 5: 58-10.1186/1476-072X-5-58.

US Census Bureau: Census 2000 Summary File 1, Census of Population and Housing. 2001, Washington, DC: US Bureau of the Census

Brooks N, Sethi R: The distribution of pollution: Community characteristics and exposure to air toxics. Journal of Environmental Economics and Management. 1997, 32: 233-250. 10.1006/jeem.1996.0967.

Beyer KMM, Schultz AF, Rushton G: Using ZIP Codes as Geocodes in Cancer Research. Geocoding Health Data: The Use of Geographic Codes in Cancer Prevention and Control, Research and Practice. Edited by: Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL. 2007, Boca Raton, Florida: CRC Press

Cayo MR, Talbot TO: Positional error in automated geocoding of residential addresses. Int J Health Geogr. 2003, 2: 10-10.1186/1476-072X-2-10.

Ward M, Nuckols J, Giglierano J, Bonner M, Wolter C, Airola M, Mix W, Colt J, Hartge P: Positional accuracy of two methods of geocoding. Epidemiology. 2005, 16: 4-10.1097/01.ede.0000147106.32027.3e.

Hurley S, Saunders T, Nivas R, Hertz A, Reynolds P: Post Office Box addresses: A challenge for Geographic Information System-based studies. Epidemiology. 2003, 14: 4-

Eicher CL, Brewer CA: Dasymetric Mapping and Areal Interpolation: Implementation and Evaluation. Cartography and Geographic Information Science. 2001

Holt JB, Lo CP, Hodler TW: Dasymetric Estimation of Population Density and Areal Interpolation of Census Data. Cartography and Geographic Information Science. 2004, 31: 2-10.1559/1523040041649407.

Goldberg DW, Wilson JP, Knoblock CA, Ritz B, Cockburn MG: An effective and efficient approach for manually improving geocoded data. Int J Health Geogr. 2008, 7: 60-10.1186/1476-072X-7-60.