CHEMIST: an R package for causal inference with high-dimensional error-prone covariates and misclassified treatments
Japanese Journal of Statistics and Data Science - Trang 1-17 - 2023
Tóm tắt
In this paper, we study causal inference with complex and noisy data accommodated. A new structure is called CHEMIST, which refers to Causal inference with High-dimensional Error-prone covariates and MISclassified Treatments. To suitably tackle those challenges when estimating the average treatment effect (ATE), we develop the FATE method, which reflects Feature screening, Adaptive lasso, Treatment adjustment, and Error elimination in covariates, to handle variable selection and measurement error correction. Under informative and error-eliminated data, we can estimate the ATE. To make our strategy available for public use, we develop a new R package CHEMIST, which provides functions for users to estimate the ATE. With the flexibility of arguments, one can examine different scenarios based on our package. In this paper, we introduce the FATE method and the implementation in the R package CHEMIST. Moreover, we demonstrate applications in two real data sets.
Tài liệu tham khảo
Baldé, I., Yang, Y. A., & Lefebvre, G. (2023). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference’’ by Shortreed and Ertefaie (2017). Biometrics, 79, 514–520.
Braun, D., Gorfine, M., Parmigiani, G., Arvold, N. D., Dominici, F., & Zigler, C. (2017). Propensity scores with misclassified treatment assignment: A likelihood-based adjustment. Biostatistics, 18, 695–710.
Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009–2022.
Chen, L.-P. (2023a). A note of feature screening via rank-based coefficient of correlation. Biometrical Journal, 1–20.
Chen, L.-P. (2023b). Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics, 1–14.
Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39, 126.
Chen, L.-P. (2021). Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariates measurement error. Computational Statistics, 36, 857–884.
Cheng, D., Li, J., Liu, L., Zhang, J., Liu, J., & Le, T. D. (2023). Local search for efficient causal effect estimation. IEEE Transactions on Knowledge & Data Engineering, 35, 8823–8837.
Chen, L.-P., & Yi, G. Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517.
Chen, L.-P., & Yi, G. Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969.
D’Amour, A. (2019). On multi-cause causal inference with unobserved confounding: counterexamples, impossibility, and alternatives. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR, 89, 3478–3486.
Ertefaie, A., Asgharian, M., & Stephens, D. A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6(20170010), 1–16.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849–911.
Ghosh, D., Zhu, Y., & Coffman, D. L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645–1658.
Hsu, W.-H., & Chen, L.-P. (2023). CHEMIST: Causal inference with high-dimensional error-prone covariates and misclassified treatments. https://cran.r-project.org/web/packages/CHEMIST/CHEMIST.pdf. R package version 0.1.5.
Koch, B., Vock, D. M., & Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445–2469.
Kong, D., Yang, S., & Wang, L. (2022). Identifiability of causal effects with multiple causes and a binary outcome. Biometrika, 109, 265–272.
Kukhareva, P. V., Caverly, T.J., Li., H., Katki, H.A., Cheung, L.C., Reese, T.J., Del Fiol, G., Hess, R., Wetter, D.W., Zhang, Y., Taft, T.Y., Flynn, M.C., & Kawamoto, K.(2022). Inaccuracies in electronic health records smoking data and a potential approach to address resulting underestimation in determining lung cancer screening eligibility. Journal of the American Medical Informatics Association, 29, 779–788.
Kyle, R. P., Moodie, E. E. M., & Klein, M. B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249–258.
Mahajan, A. (2006). Identification and estimation of regression models with misclassification. Econometrica, 74, 631–665.
McCaffrey, D. F., Lockwood, J. R., & Setodji, C. M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671–680.
Ross, R. K., Su, I.-H., Webster-Clark, M., & Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917–1925.
Saldana, D. F., & Feng, Y. (2018). SIS: An R package for sure independence screening in ultrahigh-dimensional statistical models. Journal of Statistical Software, 83, 1–25.
Shortreed, S. M., & Ertefaie, A. (2017). Outcome-adaptive lasso: Variable selection for causal inference. Biometrics, 73, 1111–1122.
Shu, D., & Yi, G. Y. (2019). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049–2068.
Tang, D., Kong, D., Pan, W., & Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics. https://doi.org/10.1111/biom.13625
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of Royal Statistical Society, Series B, 58, 267–288.
Vansteelandt, S., Bekaert, M., & Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7–30.
Yi, G. Y., & Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691–711.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.