Modeling causes of death: an integrated approach using CODEm
Tóm tắt
Data on causes of death by age and sex are a critical input into health decision-making. Priority setting in public health should be informed not only by the current magnitude of health problems but by trends in them. However, cause of death data are often not available or are subject to substantial problems of comparability. We propose five general principles for cause of death model development, validation, and reporting.
We detail a specific implementation of these principles that is embodied in an analytical tool - the Cause of Death Ensemble model (CODEm) - which explores a large variety of possible models to estimate trends in causes of death. Possible models are identified using a covariate selection algorithm that yields many plausible combinations of covariates, which are then run through four model classes. The model classes include mixed effects linear models and spatial-temporal Gaussian Process Regression models for cause fractions and death rates. All models for each cause of death are then assessed using out-of-sample predictive validity and combined into an ensemble with optimal out-of-sample predictive performance.
Ensemble models for cause of death estimation outperform any single component model in tests of root mean square error, frequency of predicting correct temporal trends, and achieving 95% coverage of the prediction interval. We present detailed results for CODEm applied to maternal mortality and summary results for several other causes of death, including cardiovascular disease and several cancers.
CODEm produces better estimates of cause of death trends than previous methods and is less susceptible to bias in model specification. We demonstrate the utility of CODEm for the estimation of several major causes of death.
Từ khóa
Tài liệu tham khảo
Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull World Health Organ 2005, 83: 171-177.
Sibai AM: Mortality certification and cause-of-death reporting in developing countries. Bull World Health Organ 2004, 82: 83-83.
Ruzicka LT, Lopez AD: The use of cause-of-death statistics for health situation assessment: national and international experiences. World Health Stat Q 1990, 43: 249-258.
Gakidou E, Mallinger L, Abbot-Klafter J, Guerrero R, Villalpando S, Lopez Ridaura R, Aekplakorn W, Naghavi M, Lim S, Lozano R, Murray CJ: Management of diabetes and associated cardiovascular risk factors in seven countries: A comparison of data from national health examination surveys. Bulletin of the World Health Organization 2011, 89: 172-183. 10.2471/BLT.10.080820
Danaei G, Finucane MM, Lu Y, Singh GM, Cowan MJ, Paciorek CJ, Lin JK, Farzadfar F, Khang Y-H, Stevens GA, Rao M, Ali MK, Riley LM, Robinson CA, Ezzati M: National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2·7 million participants. The Lancet 2011, 378: 31-40. 10.1016/S0140-6736(11)60679-X
Preston S, Kevfitz N, Schoen R: Causes of death Life tables for national populations. New York: Seminar Press; 1972.
Jougla E, Pavillon G, Rossollin F, De Smedt M, Bonte J: Improvement of the quality and comparability of causes-of-death statistics inside the European Community. Rev Epidemiol Sante Publique 1998, 46: 447-56.
Glasser JH: The quality and utility of death certificate data. Am J Public Health 1981, 71: 231-233. 10.2105/AJPH.71.3.231
World Health Organization: Manual of the International Statistical Classification of Diseases, Injuries, and Causes of Death, 1975 Revision. Geneva: World Health Organization; 1977.
World Health Organization: International statistical classification of diseases and related health problems, 10th revision. Geneva: World Health Organization; 1992.
Anderson RN, Miniño AM, Hoyert DL, Rosenberg HM: Comparability of cause of death between ICD-9 and ICD-10: preliminary estimates. Natl Vital Stat Rep 2001, 49: 1-32.
Jemal A, Ward E, Anderson RN, Thun MJ: Influence of Rules From the Tenth Revision of the International Classification of Diseases on U.S. Cancer Mortality Trends. Journal of the National Cancer Institute 2003, 95: 1727-1728. 10.1093/jnci/djg116
Rooney C, Griffiths C, Cook L: The implementation of ICD-10 for cause of death coding-some preliminary results from the bridge coding study. Health Statistics Quarterly 2002, 13: 31-41.
Grigg B, Brooks RG, Lieb S, Grigg M: Coding Changes and Apparent HIV/AIDS Mortality Trends in Florida, 1999. JAMA: The Journal of the American Medical Association 2001, 286: 1839. 10.1001/jama.286.15.1839
Yudkin PL, Burger EH, Bradshaw D, Groenewald P, Ward AM, Volmink J: Deaths caused by HIV disease under-reported in South Africa. AIDS 2009, 23: 1600-1602. 10.1097/QAD.0b013e32832d4719
Groenewald P, Nannan N, Bourne D, Laubscher R, Bradshaw D: Identifying deaths from AIDS in South Africa. AIDS 2005, 19: 193-201. 10.1097/00002030-200501280-00012
Kern EFO, Maney M, Miller DR, Tseng C, Tiwari A, Rajan M, Aron D, Pogach L: Failure of ICD-9-CM Codes to Identify Patients with Comorbid Chronic Kidney Disease in Diabetes. Health Services Research 2006, 41: 564-580. 10.1111/j.1475-6773.2005.00482.x
D'Amico M, Agozzino E, Biagino A, Simonetti A, Marinelli P: Ill-defined and multiple causes on death certificates - A study of misclassification in mortality statistics. European Journal of Epidemiology 1999, 15: 141-48. 10.1023/A:1007570405888
Cheng WS, Wingard DL, Kritz-Silverstein D, Barrett-Connor E: Sensitivity and Specificity of Death Certificates for Diabetes. Diabetes Care 2008, 31: 279-284.
Lu T-H, Anderson RN, Kawachi I: Trends in Frequency of Reporting Improper Diabetes-related Cause-of-Death Statements on Death Certificates, 1985-2005: An Algorithm to Identify Incorrect Causal Sequences. American Journal of Epidemiology 2010, 171: 1069-1078. 10.1093/aje/kwq057
McEwen LN, Karter AJ, Curb JD, Marrero DG, Crosson JC, Herman WH: Temporal Trends in Recording of Diabetes on Death Certificates. Diabetes Care 2011, 34: 1529-1533. 10.2337/dc10-2312
Morton L, Omar R, Carroll S, Beirne M, Halliday D, Taylor K: Incomplete and inaccurate death certification - the impact on research. Journal of Public Health 2000, 22: 133-137. 10.1093/pubmed/22.2.133
Sehdev AES, Hutchins GM: Problems With Proper Completion and Accuracy of the Cause-of-Death Statement. Arch Intern Med 2001, 161: 277-284. 10.1001/archinte.161.2.277
Lahti RA, Penttilä A: The validity of death certificates: routine validation of death certification and its effects on mortality statistics. Forensic Sci Int 2001, 115: 15-32. 10.1016/S0379-0738(00)00300-5
Mackenbach JP, Van Duyne WM, Kelson MC: Certification and coding of two underlying causes of death in The Netherlands and other countries of the European Community. Journal of Epidemiology and Community Health 1987, 41: 156-160. 10.1136/jech.41.2.156
Lakkireddy DR, Gowda MS, Murray CW, Basarakodu KR, Vacek JL: Death certificate completion: How well are physicians trained and are cardiovascular causes overstated? The American Journal of Medicine 2004, 117: 492-498. 10.1016/j.amjmed.2004.04.018
Lloyd-Jones DM, Martin DO, Larson MG, Levy D: Accuracy of Death Certificates for Coding Coronary Heart Disease as the Cause of Death. Annals of Internal Medicine 1998, 129: 1020-1026.
Preston SH: Mortality Patterns in National Populations: With Special Reference to Recorded Causes of Death. New York: Academic Pr; 1976.
Lopez AD, Hull TH: A note on estimating the cause of death structure in high mortality populations. Popul Bull UN 1982, 66-70.
Hakulinen T, Hansluwka H, Lopez AD, Nakada T: Global and Regional Mortality Patterns by Cause of Death in 1980. Int J Epidemiol 1986, 15: 226-233. 10.1093/ije/15.2.226
Hull T, Lopez A, Rohde J: A framework for estimating causes of death in Indonesia [causes of death in Indonesia]. Majalah Demografi Indones 1981, 8: 77-125.
Bulatao RA, Stephens PW: Global estimates and projections of mortality by cause, 1970-2015. The World Bank. 1992.
Black RE, Morris SS, Bryce J: Where and why are 10 million children dying every year? The Lancet 2003, 361: 2226-2234. 10.1016/S0140-6736(03)13779-8
Murray C, Lopez A: The global burden of disease: a comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected in 2020. Cambridge, MA: Harvard Univ. Press; 1996.
Wilmoth J, Mathers C, Sayc L, Millsd S: Maternal deaths drop by one-third from 1990 to 2008: a United Nations analysis. Bull World Health Organ 2010, 88: 718-718A. 10.2471/BLT.10.082446
Hogan MC, Foreman KJ, Naghavi M, Ahn SY, Wang M, Makela SM, Lopez AD, Lozano R, Murray CJ: Maternal mortality for 181 countries, 1980-2008: a systematic analysis of progress towards Millennium Development Goal 5. The Lancet 2010, 375: 1609-1623. 10.1016/S0140-6736(10)60518-1
Boschi-Pinto C, Lanata CF, Black RE: The Global Burden of Childhood Diarrhea. In Maternal and Child Health. Edited by: Ehiri J. Boston, MA: Springer US; 2009:225-243.
Horton R: Maternal mortality: surprise, hope, and urgent action. The Lancet 2010, 375: 1581-1582. 10.1016/S0140-6736(10)60547-8
Graham WJ, Braunholtz DA, Campbell OM: New modelled estimates of maternal mortality. The Lancet 2010, 375: 1963.
AbouZahr C: New estimates of maternal mortality and how to interpret them: choice or confusion? Reproductive Health Matters 2011, 19: 117-128. 10.1016/S0968-8080(11)37550-7
Byass P: The Imperfect World of Global Health Estimates. PLoS Med 2010, 7: e1001006. 10.1371/journal.pmed.1001006
Rajaratnam JK, Marcus JR, Flaxman AD, Wang H, Levin-Rector A, Dwyer L, Costa M, Lopez AD, Murray CJ: Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970-2010: a systematic analysis of progress towards Millennium Development Goal 4. The Lancet 2010, 375: 1988-2008. 10.1016/S0140-6736(10)60703-9
Finucane MM, Stevens GA, Cowan MJ, Danaei G, Lin JK, Paciorek CJ, Singh GM, Gutierrez HR, Lu Y, Bahalim AN, Farzadfar F, Riley LM, Ezzati M: National, regional, and global trends in body-mass index since 1980: systematic analysis of health examination surveys and epidemiological studies with 960 country-years and 9·1 million participants. The Lancet 2011, 377: 557-567. 10.1016/S0140-6736(10)62037-5
Farzadfar F, Finucane MM, Danaei G, Pelizzari PM, Cowan MJ, Paciorek CJ, Singh GM, Lin JK, Stevens GA, Riley LM, Ezzati M: National, regional, and global trends in serum total cholesterol since 1980: systematic analysis of health examination surveys and epidemiological studies with 321 country-years and 3·0 million participants. The Lancet 2011, 377: 578-586. 10.1016/S0140-6736(10)62038-7
Danaei G, Finucane MM, Lin JK, Singh GM, Paciorek CJ, Cowan MJ, Farzadfar F, Stevens GA, Lim SS, Riley LM, Ezzati M: National, regional, and global trends in systolic blood pressure since 1980: systematic analysis of health examination surveys and epidemiological studies with 786 country-years and 5·4 million participants. The Lancet 2011, 377: 568-577. 10.1016/S0140-6736(10)62036-3
Janssen F, Kunst AE: ICD coding changes and discontinuities in trends in cause-specific mortality in six European countries, 1950-99. Bull World Health Organ 2004, 82: 904-913.
Feuer EJ, Merrill RM, Hankey BF: Cancer Surveillance Series: Interpreting Trends in Prostate Cancer--Part II: Cause of Death Misclassification and the Recent Rise and Fall in Prostate Cancer Mortality. Journal of the National Cancer Institute 1999, 91: 1025-1032. 10.1093/jnci/91.12.1025
Horon IL: Underreporting of Maternal Deaths on Death Certificates and the Magnitude of the Problem of Maternal Mortality. Am J Public Health 2005, 95: 478-482. 10.2105/AJPH.2004.040063
Naghavi M, Makela S, Foreman K, O'Brien J, Pourmalek F, Lozano R: Algorithms for enhancing public health utility of national causes-of-death data. Popul Health Metrics 2010, 8: 9. 10.1186/1478-7954-8-9
Ahern RM, Lozano R, Naghavi M, Foreman K, Gakidou E, Murray CJ: Improving the public health utility of global cardiovascular mortality data: the rise of ischemic heart disease. Popul Health Metrics 2011, 9: 8. 10.1186/1478-7954-9-8
Birnbaum J, Murray C, Lozano R: Exposing misclassified HIV/AIDS deaths in South Africa. Bulletin of the World Health Organization 2011, 89: 278-285. 10.2471/BLT.11.086280
Mathers C, Bernard C, Iburg K, Inoue M, Ma Fat D, Shibuya K, Stein C, Tomijima N, Xu H: Global burden of disease in 2002: data sources, methods and results. Geneva: GPE Discussion PAper- No. 54-World Health Organization; 2003.
Bell RM, Koren Y: Lessons from the Netflix prize challenge. SIGKDD Explor Newsl 2007, 9: 75-79. 10.1145/1345448.1345465
Bell R, Koren Y, Volinsky C: The BellKor solution to the Netflix Prize.[http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf]
Bell RM, Koren Y, Volinsky C: All together now: A perspective on the NETFLIX PRIZE. CHANCE 2010, 23: 24-24. 10.1007/s00144-010-0005-2
Ajami NK, Duan Q, Sorooshian S: An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction. Water Resour Res 2007, 43: 19.
Taylor JW, Buizza R: Neural network load forecasting with weather ensemble predictions. IEEE Transactions on Power Systems 2002, 17: 626-632. 10.1109/TPWRS.2002.800906
Krishnamurti TN, Kishtawal CM, Zhang Z, LaRow T, Bachiochi D, Williford E, Gadgil S, Surendran S: Multimodel Ensemble Forecasts for Weather and Seasonal Climate. J Climate 2000, 13: 4196-4216. 10.1175/1520-0442(2000)013<4196:MEFFWA>2.0.CO;2
Chen Y, Yang B, Abraham A: Flexible neural trees ensemble for stock index modeling. Neurocomputing 2007, 70: 697-703. 10.1016/j.neucom.2006.10.005
Castillo O, Melin P: Simulation and forecasting complex economic time series usingneural networks and fuzzy logic. In International Joint Conference on Neural Networks, 2001. Proceedings. IJCNN '01. Volume 3. IEEE; 2001:1805-1810.
Wöhling T, Vrugt JA: Combining multiobjective optimization and Bayesian model averaging to calibrate forecast ensembles of soil hydraulic models. Water Resour Res 2008, 44: 18.
Gneiting T, Raftery AE: Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association 2007, 102: 359-378. 10.1198/016214506000001437
Vrugt J, Robinson BA: Treatment of uncertainty using ensemble methods: Comparison of sequential data assimilation and Bayesian model averaging. Water Resour Res 2007, 43: W01411.1-W01411.15.
Raftery A, Gneiting T, Balabdaoui F, Polakowski M: Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review 2005, 133: 1155-74. 10.1175/MWR2906.1
Hoeting JA, Madigan D, Raftery AE, Volinsky CT: Bayesian Model Averaging: A Tutorial. Statistical Science 1999, 14: 382-401. 10.1214/ss/1009212519
King G: "Truth" Is Stranger than Prediction, More Questionable than Causal Inference. American Journal of Political Science 1991, 35: 1047-1053. 10.2307/2111506
Power M: The predictive validation of ecological and environmental models. Ecological Modelling 1993, 68: 33-50. 10.1016/0304-3800(93)90106-3
Snee RD: Validation of Regression Models: Methods and Examples. Technometrics 1977, 19: 415-428. 10.2307/1267881
Dey DK, Gelfand AE, Swartz TB, Vlachos PK: A simulation-intensive approach for checking hierarchical models. Test 1998, 7: 325-346. 10.1007/BF02565116
Tashman LJ: Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting 16: 437-450.
Fushiki T: Estimation of prediction error by using K-fold cross-validation. Stat Comput 2009, 21: 137-146.
Zhang P: Model Selection Via Multifold Cross Validation. The Annals of Statistics 1993, 21: 299-313. 10.1214/aos/1176349027
Shao J: Linear Model Selection by Cross-Validation. Journal of the American Statistical Association 1993, 88: 486-494. 10.2307/2290328
Gompertz B: On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of Life Contingencies. In Philosophical transaction of the Royal society of London. Volume 1825. London: W. Nicol; 513.
Derksen S, Keselman H: Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables. British journal of mathematical & statistical psychology 1992, 45: 265-82. 10.1111/j.2044-8317.1992.tb00992.x
Blanchet FG, Legendre P, Borcard D: FORWARD SELECTION OF EXPLANATORY VARIABLES. Ecology 2008, 89: 2623-2632. 10.1890/07-0986.1
Meinshausen N: High-dimensional graphs and variable selection with the Lasso. Ann Statist 2006, 34: 1436-1462. 10.1214/009053606000000281
Zou H, Hastie T: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005, 67: 301-320. 10.1111/j.1467-9868.2005.00503.x
Smith M, Kohn R: Nonparametric regression using Bayesian variable selection. Journal of Econometrics 1996, 75: 317-343. 10.1016/0304-4076(95)01763-1
Allen DM: The Relationship between Variable Selection and Data Agumentation and a Method for Prediction. Technometrics 1974, 16: 125-127. 10.2307/1267500
Greenland S: Modeling and variable selection in epidemiologic analysis. Am J Public Health 1989, 79: 340-349. 10.2105/AJPH.79.3.340
Daszykowski M, Kaczmarek K, Vander Heyden Y, Walczak B: Robust statistics in data analysis -- A review: Basic concepts. Chemometrics and Intelligent Laboratory Systems 2007, 85: 203-219. 10.1016/j.chemolab.2006.06.016
Rasmussen CE: Gaussian Processes in Machine Learning. In Advanced Lectures on Machine Learning. Volume 3176. Edited by: Bousquet O, Luxburg U, Rätsch G Berlin. Heidelberg: Springer Berlin Heidelberg; 2004:63-71. 10.1007/978-3-540-28650-9_4
Rajaratnam JK, Marcus JR, Levin-Rector A, Chalupka AN, Wang H, Dwyer L, Costa M, Lopez AD, Murray CJ: Worldwide mortality in men and women aged 15-59 years from 1970 to 2010: a systematic analysis. The Lancet 2010, 375: 1704-1720. 10.1016/S0140-6736(10)60517-X
Gneiting T, Raftery AE: Weather Forecasting with Ensemble Methods. Science 2005, 310: 248-249. 10.1126/science.1115255
Duan Q, Ajami NK, Gao X, Sorooshian S: Multi-model ensemble hydrologic prediction using Bayesian model averaging. Advances in Water Resources 2007, 30: 1371-1386. 10.1016/j.advwatres.2006.11.014
Lozano R, Wang H, Foreman KJ, Rajaratnam JK, Naghavi M, Marcus JR, Dwyer-Lindgren L, Lofgren KT, Phillips D, Atkinson C, Lopez AD, Murray CJL: Progress towards Millennium Development Goals 4 and 5 on maternal and child mortality: an updated systematic analysis. Lancet 2011, 378: 1139-1165. 10.1016/S0140-6736(11)61337-8
Forouzanfar MH, Foreman KJ, Delossantos AM, Lozano R, Lopez AD, Murray CJL, Naghavi M: Breast and cervical cancer in 187 countries between 1980 and 2010: a systematic analysis. Lancet 2011, 378: 1461-1484. 10.1016/S0140-6736(11)61351-2
Murray CJ, Lopez AD, Black R, Ahuja R, Mohd Ali S, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gomez S, Hernandez B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, Ramirez-Villalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metrics 2011, 9: 27. 10.1186/1478-7954-9-27