Geostatistical analysis of disease data: estimation of cancer mortality risk from empirical frequencies using Poisson kriging
Tóm tắt
Cancer mortality maps are used by public health officials to identify areas of excess and to guide surveillance and control activities. Quality of decision-making thus relies on an accurate quantification of risks from observed rates which can be very unreliable when computed from sparsely populated geographical units or recorded for minority populations. This paper presents a geostatistical methodology that accounts for spatially varying population sizes and spatial patterns in the processing of cancer mortality data. Simulation studies are conducted to compare the performances of Poisson kriging to a few simple smoothers (i.e. population-weighted estimators and empirical Bayes smoothers) under different scenarios for the disease frequency, the population size, and the spatial pattern of risk. A public-domain executable with example datasets is provided.
The analysis of age-adjusted mortality rates for breast and cervix cancers illustrated some key features of commonly used smoothing techniques. Because of the small weight assigned to the rate observed over the entity being smoothed (kernel weight), the population-weighted average leads to risk maps that show little variability. Other techniques assign larger and similar kernel weights but they use a different piece of auxiliary information in the prediction: global or local means for global or local empirical Bayes smoothers, and spatial combination of surrounding rates for the geostatistical estimator. Simulation studies indicated that Poisson kriging outperforms other approaches for most scenarios, with a clear benefit when the risk values are spatially correlated. Global empirical Bayes smoothers provide more accurate predictions under the least frequent scenario of spatially random risk.
The approach presented in this paper enables researchers to incorporate the pattern of spatial dependence of mortality rates into the mapping of risk values and the quantification of the associated uncertainty, while being easier to implement than a full Bayesian model. The availability of a public-domain executable makes the geostatistical analysis of health data, and its comparison to traditional smoothers, more accessible to common users. In future papers this methodology will be generalized to the simulation of the spatial distribution of risk values and the propagation of the uncertainty attached to predicted risks in local cluster analysis.
Từ khóa
Tài liệu tham khảo
Wakefield J: A critique of statistical aspects of ecological studies in spatial epidemiology. Environmental and Ecological Statistics. 2004, 11: 31-54.
Lawson AB: Tutorial in biostatistics: Disease map reconstruction. Statistics in Medicine. 2001, 20: 2183-2204.
Kafadar K: Choosing among two-dimensional smoothers in practice. Computational Statistics and Data Analysis. 1994, 18: 419-439.
Talbot TO, Kulldorff M, Forand SP, Haley VB: Evaluation of spatial filters to create smoothed maps of health data. Statistics in Medicine. 2000, 19: 2399-2408.
Mungiole M, Pickle LW, Hansen Simonson K: Application of a weighted head-banging algorithm to mortality data maps. Statistics in Medicine. 1999, 18: 3201-3209.
Best N, Richardson S, Thomson A: A comparison of Bayesian spatial models for disease mapping. Statistical Methods in Medical Research. 2005, 14: 35-59.
Pickle LW: Exploring spatio-temporal patterns of mortality using mixed effects models. Statistics in Medicine. 2000, 19: 2251-2263.
Christensen OF, Waagepetersen R: Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics. 2002, 58: 280-286.
Besag J, York J, Mollie A: Bayesian image restoration with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics. 1991, 43: 1-59.
Schabenberger O, Gotway CA: Statistical Methods for Spatial Data Analysis. 2005, New York: Chapman & Hall
Leyland AH, Davies CA: Empirical Bayes methods for disease mapping. Statistical Methods in Medical Research. 2005, 14: 17-34.
Johnson GD: Small area mapping of prostate cancer incidence in New York State (USA) using fully Bayesian hierarchical modelling. International Journal of Health Geographics. 2004, 3: 29-
Clayton DG, Kaldor J: Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics. 1987, 43: 671-681.
Marshall RJ: Mapping disease and mortality rates using empirical Bayes estimators. Applied Statistics. 1991, 40 (2): 283-294.
Goovaerts P: Geostatistics for Natural Resources Evaluation. 1997, New York: Oxford University Press
Waller LA, Gotway CA: Applied Spatial Statistics for Public Health Data. 2004, New Jersey: John Wiley and Sons
Pickle LW: Spatial analysis of disease. Biostatistical Applications in Cancer Research. Edited by: Beam C. 2002, Boston, Kluwer Academic Publishers, Chapter 7: 113-150.
Berke O: Exploratory disease mapping: kriging the spatial risk function from regional count data. International Journal of Health Geographics. 2004, 3: 18-
Goovaerts P, Jacquez GM: Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York. International Journal of Health Geographics. 2004, 3: 14-
Goovaerts P, Jacquez GM, Greiling D: Exploring scale-dependent correlations between cancer mortality rates using factorial kriging and population-weighted semivariograms: a simulation study. Geographical Analysis. 2005, 37: 152-182.
Oliver MA, Webster R, Lajaunie C, Muir KR, Parkes SE, Cameron AH, Stevens MCG, Mann JR: Binomial cokriging for estimating and mapping the risk of childhood cancer. IMA Journal of Mathematics Applied in Medicine and Biology. 1998, 15: 279-297.
Goovaerts P: Simulation-based assessment of a geostatistical approach for estimation and mapping of the risk of cancer. Geostatistics Banff 2004. Edited by: Leuangthong O, Deutsch CV. 2005, Dordrecht, The Netherlands, Kluwer Academic Publishers, 2: 787-796.
Goovaerts P: Detection of spatial clusters and outliers in cancer rates using geostatistical filters and spatial neutral models. geoENV V – Geostatistics for Environmental Applications. Edited by: Renard Ph, Demougeot-Renard H, Froidevaux R. 2005, The Netherlands, Springer-Verlag, 149-160.
Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Comparison of model based geostatistical methods in ecology: application to fin whale spatial distribution in northwestern Mediterranean Sea. Geostatistics Banff 2004. Edited by: Leuangthong O, Deutsch CV. 2005, Dordrecht, The Netherlands, Kluwer Academic Publishers, 2: 777-786.
Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Geostatistical modelling of spatial distribution of Balenoptera physalus in the northwestern Mediterranean Sea from sparse count data and heterogeneous observation efforts. Ecological Modelling. 2006,
Diggle PJ, Tawn JA, Moyeed RA: Model-based geostatistics. Applied Statistics. 1998, 47: 229-350.
Goovaerts P: Analysis and detection of health disparities using geostatistics and a space-time information system. The case of prostate cancer mortality in the United States, 1970-1994. In Proceedings of GIS Planet. 2005, Paper available at http://home.comcast.net/~goovaerts/Paper148_PierreGoovaerts.pdf, Congress
Pickle LW, Mungiole M, Jones GK, White AA: Exploring spatial patterns of mortality: the new Atlas of United States mortality. Statistics in Medicine. 1999, 18: 3211-3220.
Grauman DJ, Tarone RE, Devesa SS, Fraumeni JF: Alternate ranging methods for cancer mortality maps. Journal of the National Cancer Institute. 2000, 92 (7): 534-543.
Brewer CA, Pickle L: Evaluation of methods for classifying epidemiological data on choropleth maps in series. Annals of the Association of American Geographers. 2002, 92 (4): 662-681.
Deutsch CV, Journel AG: GSLIB: Geostatistical Software Library and User's Guide. 1998, New York: Oxford Univ. Press, 2
Pardo-Iguzquiza E: VARFIT: a Fortran-77 program for fitting variogram models by weighted least squares. Computers and Geosciences. 1999, 25: 251-261.
Englund E, Sparks A: Geo-EAS 1.2.1 User's Guide. EPA Report #60018-91/008. 1988, EPA-EMSL, Las Vegas, NV
Richardson S, Thomson A, Best N, Elliot P: Interpreting posterior relative risk estimates in disease-mapping studies. Environmental Health Perspectives. 2004, 112: 1016-1025.
Deutsch CV: Direct assessment of local accuracy and precision. Geostatistics Wollongong '96. Edited by: Baafi EY, Schofield NA. 1997, Dordrecht, The Netherlands, Kluwer Academic Publishers, 1: 115-125.
Kaiser HF: The varimax criterion for analytic rotation in factor analysis. Psychometrica. 1958, 23: 187-200.
Goovaerts P: Spatial orthogonality of the principal components computed from coregionalized variables. Mathematical Geology. 1993, 25 (3): 281-302.
Gotway CA, Young LJ: Combining incompatible spatial data. Journal of the American Statistical Association. 2002, 97: 632-648.
Gotway CA, Young LJ: Change of support: an inter-disciplinary challenge. geoENV V – Geostatistics for Environmental Applications. Edited by: Renard Ph, Demougeot-Renard H, Froidevaux R. 2005, The Netherlands, Springer-Verlag, 1-13.