Characterizing bias due to differential exposure ascertainment in electronic health record data
Tóm tắt
Data derived from electronic health records (EHR) are heterogeneous with availability of specific measures dependent on the type and timing of patients’ healthcare interactions. This creates a challenge for research using EHR-derived exposures because gold-standard exposure data, determined by a definitive assessment, may only be available for a subset of the population. Alternative approaches to exposure ascertainment in this case include restricting the analytic sample to only those patients with gold-standard exposure data available (exclusion); using gold-standard data, when available, and using a proxy exposure measure when the gold standard is unavailable (best available); or using a proxy exposure measure for everyone (common data). Exclusion may induce selection bias in outcome/exposure association estimates, while incorporating information from a proxy exposure via either the best available or common data approaches may result in information bias due to measurement error. The objective of this paper was to explore the bias and efficiency of these three analytic approaches across a broad range of scenarios motivated by a study of the association between chronic hyperglycemia and 5-year mortality in an EHR-derived cohort of colon cancer survivors. We found that the best available approach tended to mitigate inefficiency and selection bias resulting from exclusion while suffering from less information bias than the common data approach. However, bias in all three approaches can be severe, particularly when both selection bias and information bias are present. When risk of either of these biases is judged to be more than moderate, EHR-based analyses may lead to erroneous conclusions.
Tài liệu tham khảo
Chen, Y., Wang, J., Chubak, J., Hubbard, R.A.: Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: empirical illustration using breast cancer recurrence. Pharmacoepidemiol. Drug Saf. 28(2), 264–268 (2019)
Duan, R., Cao, M., Wu, Y., Huang, J., Denny, J.C., Xu, H., Chen, Y.: An empirical study for impacts of measurement errors on EHR based association studies. In: AMIA Annual Symposium Proceedings. American Medical Informatics Association, volume 2016, p. 1764 (2016)
Hubbard, R.A., Harton, J., Zhu, W., Wang, L., Chubak, J.: Accounting for differential error in time-to-event analyses using imperfect electronic health record-derived endpoints. In: New Advances in Statistics and Data Science, pp. 239–255. Springer (2017)
Brunelli, S.M., Gagne, J.J., Huybrechts, K.F., Wang, S.V., Patrick, A.R., Rothman, K.J., Seeger, J.D.: Estimation using all available covariate information versus a fixed look-back window for dichotomous covariates. Pharmacoepidemiol. Drug Saf. 22, 542–550 (2013)
Connolly, J.G., Schneeweiss, S., Glynn, R.J., Gagne, J.J.: Quantifying bias reduction with fixed-duration versus all-available covariate assessment periods. Pharmacoepidemiol. Drug Saf. 28(5), 665–670 (2019)
Lin, K.J., Glynn, R.J., Singer, D.E., Murphy, S.N., Lii, J., Schneeweiss, S.: Out-of-system care and recording of patient characteristics critical for comparative effectiveness research. Epidemiology 29(3), 356 (2018)
Lin, K.J., Singer, D.E., Glynn, R.J., Murphy, S.N., Lii, J., Schneeweiss, S.: Identifying patients with high data completeness to improve validity of comparative effectiveness research in electronic health records data. Clin. Pharmacol. Ther. 103(5), 899–905 (2018)
Chubak, J., Yu, O., Ziebell, R.A., Bowles, E.J., Sterrett, A.T., Fujii, M.M., Boggs, J.M., Burnett-Hartman, A.N., Boudreau, D.M., Chen, L., Floyd, J.S., Ritzwoller, D.P., Hubbard, R.A.: Risk of colon cancer recurrence in relation to diabetes. Cancer Causes Control 29(11), 1093–1103 (2018)
American Diabetes Association: Glycemic targets: Standards of medical care in Diabetes. Diabetes Care 41(January), S55–S64 (2018)
Thomas, B.S., Jafarzadeh, S.R., Warren, D.K., McCormick, S., Fraser, V.J., Marschall, J.: Temporal trends in the systemic inflammatory response syndrome, sepsis, and medical coding of sepsis. BMC Anesthesiol. 15(1), 169 (2015)
Valkhoff, V.E., Coloma, P.M., Masclee, G.M., Gini, R., Innocenti, F., Lapi, F., Molokhia, M., Mosseveld, M., Nielsson, M.S., Schuemie, M., et al.: Validation study in four health-care databases: upper gastrointestinal bleeding misclassification affects precision but not magnitude of drug-related upper gastrointestinal bleeding risk. J. Clin. Epidemiol. 67(8), 921–931 (2014)
McCarthy, C., Murphy, S., Cohen, J.A., Rehman, S., Jones-O’Connor, M., Olshan, D.S., Singh, A., Vaduganathan, M., Januzzi, J.L., Wasfy, J.H.: Misclassification of myocardial injury as myocardial infarction: implications for assessing outcomes in value-based programs. JAMA Cardiol 4(5), 460–464 (2019)
Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Models: A Modern Perspective. Chapman & Hall/CRC, Boca Raton (2006)
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, Hoboken (2019)
Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22(3), 278–295 (2013)
Johnson, C.Y., Flanders, W.D., Strickland, M.J., Honein, M.A., Howards, P.P.: Potential sensitivity of bias analysis results to incorrect assumptions of nondifferential or differential binary exposures misclassification. Epidemiology (Cambridge, Mass) 25(6), 902 (2014)