Learning (predictive) risk scores in the presence of censoring due to interventions
Tóm tắt
A large and diverse set of measurements are regularly collected during a patient’s hospital stay to monitor their health status. Tools for integrating these measurements into severity scores, that accurately track changes in illness severity, can improve clinicians’ ability to provide timely interventions. Existing approaches for creating such scores either (1) rely on experts to fully specify the severity score, (2) infer a score using detailed models of disease progression, or (3) train a predictive score, using supervised learning, by regressing against a surrogate marker of severity such as the presence of downstream adverse events. The first approach does not extend to diseases where an accurate score cannot be elicited from experts. The second assumes that the progression of disease can be accurately modeled, limiting its application to populations with simple, well-understood disease dynamics. The third approach, also most commonly used, often produces scores that suffer from bias due to treatment-related censoring (Paxton et al. in AMIA annual symposium proceedings, American Medical Informatics Association, p 1109, 2013). Specifically, since the downstream outcomes used for their training are observed only noisily and are influenced by treatment administration patterns, these scores do not generalize well when treatment administration patterns change. We propose a novel ranking based framework for disease severity score learning (DSSL). DSSL exploits the following key observation: while it is challenging for experts to quantify the disease severity at any given time, it is often easy to compare the disease severity at two different times. Extending existing ranking algorithms, DSSL learns a function that maps a vector of patient’s measurements to a scalar severity score subject to two constraints. First, the resulting score should be consistent with the expert’s ranking of the disease severity state. Second, changes in score between consecutive periods should be smooth. We apply DSSL to the problem of learning a sepsis severity score using a large, real-world electronic health record dataset. The learned scores significantly outperform state-of-the-art clinical scores in ranking patient states by severity and in early detection of downstream adverse events. We also show that the learned disease severity trajectories are consistent with clinical expectations of disease evolution. Further, we simulate datasets containing different treatment administration patterns and show that DSSL shows better generalization performance to changes in treatment patterns compared to the above approaches.
Tài liệu tham khảo
AHRQ. (2015). Guideline syntheses. http://www.guideline.gov/syntheses/index.aspx.
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning, ACM, (pp. 89–96).
Burges, C. J. (2010). From ranknet to lambdarank to lambdamart: An overview. Technical report, Microsoft Research.
Burges, C. J., Ragno, R., & Le, Q. V. (2006). Learning to rank with nonsmooth cost functions. In: Advances in neural information processing systems, (pp. 193–200).
Chapelle, O., & Keerthi, S. S. (2010). Efficient algorithms for ranking with SVMs. Information Retrieval, 13(3), 201–215.
Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. Neural Computation, 19(3), 792–815.
Clermont, G., Angus, D. C., DiRusso, S. M., Griffin, M., & Linde-Zwirble, W. T. (2001). Predicting hospital mortality for patients in the intensive care unit: A comparison of artificial neural networks with logistic regression models. Critical Care Medicine, 29(2), 291–296.
Dellinger, R. P., Levy, M. M., Rhodes, A., Annane, D., Gerlach, H., Opal, S. M., et al. (2013). Surviving sepsis campaign: International guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Medicine, 39(2), 165–228.
Dyagilev, K., & Saria, S. (2015). Learning a severity score for sepsis: A novel approach based on clinical comparisons. In AMIA Annual symposium proceedings, American Medical Informatics Association
Fine, M. J., Auble, T. E., Yealy, D. M., Hanusa, B. H., Weissfeld, L. A., Singer, D. E., et al. (1997). A prediction rule to identify low-risk patients with community-acquired pneumonia. New England Journal of Medicine, 336(4), 243–250.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Ghanem-Zoubi, N. O., Vardi, M., Laor, A., Weber, G., & Bitterman, H. (2011). Assessment of disease-severity scoring systems for patients with sepsis in general internal medicine departments. Critical Care Medicine, 15(2), R95.
Henry, K. E., Hager, D. N., Provonost, P. J., & Saria, S. (2015). A targeted real-time early warning score (TREWScore) for septic shock. Science Translational Medicine, 7, 299ra122.
Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classifiers, (pp. 115–132). Cambridge: The MIT Press.
Ho, J. C., Lee, C. H., & Ghosh, J. (2012). Imputation-enhanced prediction of septic shock in ICU patients. In Proceedings of the ACM SIGKDD workshop on health informatics (HI-KDD12).
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.
Hug, C. (2009). Detecting hazardous intensive care patient episodes using real-time mortality models. PhD thesis.
Jackson, C. H., Sharples, L. D., Thompson, S. G., Duffy, S. W., & Couto, E. (2003). Multistate Markov models for disease progression with classification error. Journal of the Royal Statistical Society: Series D (The Statistician), 52(2), 193–209.
Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, (pp. 133–142).
Keegan, M. T., Gajic, O., & Afessa, B. (2011). Severity of illness scoring systems in the intensive care unit. Critical Care Medicine, 39(1), 163–169.
Knaus, W. A., Draper, E. A., Wagner, D. P., & Zimmerman, J. E. (1985). APACHE II: A severity of disease classification system. Critical Care Medicine, 13(10), 818–829.
Kumar, G., Kumar, N., Taneja, A., Kaleekal, T., Tarima, S., McGinley, E., et al. (2011). Nationwide trends of severe sepsis in the 21st century (2000–2007). CHEST Journal, 140(5), 1223–1231.
Kuo, T. M., Lee, C. P., & Lin, C. J. (2014). Large-scale kernel RankSVM. In Proceedings of the 2014 SIAM international conference on data mining, SIAM.
Marshall, J. C., Cook, D. J., Christou, N. V., Bernard, G. R., Sprung, C. L., & Sibbald, W. J. (1995). Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome. Critical Care Medicine, 23(10), 1638–1652.
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent in function space. Advances in Neural Information Processing Systems, 12, 512–518.
Matveeva, I., Burges, C., Burkard, T., Laucius, A., & Wong, L. (2006). High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 437–444), ACM.
Medsger, T., Bombardieri, S., Czirjak, L., Scorza, R., Rossa, A., & Bencivelli, W. (2003). Assessment of disease severity and prognosis. Clinical and Experimental Rheumatology, 21(3; SUPP/29), S42–S46.
Minne, L., Abu-Hanna, A., de Jonge, E., et al. (2008). Evaluation of SOFA-based models for predicting mortality in the ICU: A systematic review. Critical Care Medicine, 12(6), R161.
Mohan, A., Chen, Z., & Weinberger, K. Q. (2011). Web-search ranking with initialized gradient boosted regression trees. In Yahoo! learning to rank challenge, Citeseer, (pp. 77–89).
Mould, D. (2012). Models for disease progression: New approaches and uses. Clinical Pharmacology & Therapeutics, 92(1), 125–131.
Paxton, C., Niculescu-Mizil, A., & Saria, S. (2013). Developing predictive models using electronic medical records: Challenges and pitfalls. In AMIA annual symposium proceedings, American Medical Informatics Association, vol. 2013, p. 1109.
Pirracchio, R., Petersen, M. L., Carone, M., Rigon, M. R., Chevret, S., & van der Laan, M. J. (2015). Mortality prediction in intensive care units with the super ICU learner algorithm (SICULA): A population-based study. The Lancet Respiratory Medicine, 3(1), 42–52.
Qin, T., Zhang, X. D., Wang, D. S., Liu, T. Y., Lai, W., & Li, H. (2007). Ranking with multiple hyperplanes. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 279–286).
Saeed, M., Lieu, C., Raber, G., & Mark, R. (2002). MIMIC II: A massive temporal ICU patient database to support research in intelligent patient monitoring. In Computers in Cardiology, 2002, IEEE, (pp. 641–644).
Saeed, M., Villarroel, M., Reisner, A. T., Clifford, G., Lehman, L. W., Moody, G., et al. (2011). Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine, 39(5), 952.
Saria, S., Koller, D., & Penn, A. (2010a). Learning individual and population level traits from clinical temporal data. In Predictive models in personalized medicine workshop, neural information processing systems.
Saria, S., Rajani, A. K., Gould, J., Koller, D., & Penn, A. A. (2010b). Integration of early physiological responses predicts later illness severity in preterm infants. Science Translational Medicine, 2(48), 48ra65–48ra65.
Sebat, F., Musthafa, A. A., Johnson, D., Kramer, A. A., Shoffner, D., Eliason, M., et al. (2007). Effect of a rapid response system for patients in shock on time to treatment and mortality during 5 years. Critical Care Medicine, 35(11), 2568–2575.
Tsai, M. F., Liu, T. Y., Qin, T., Chen, H. H., & Ma, W. Y. (2007). Frank: A ranking method with fidelity loss. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 383–390).
Vincent, J. L., Moreno, R., Takala, J., Willatts, S., De Mendonça, A., Bruining, H., et al. (1996). The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Medicine, 22(7), 707–710.
Wang, X., Sontag, D., & Wang, F. (2014). Unsupervised learning of disease progression models. In Proceedings of the twentieth ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 85–94).
Wiens, J., Horvitz, E., & Guttag, J. V. (2012). Patient risk stratification for hospital-associated c. diff as a time-series classification task. Advances in Neural Information Processing Systems, 25, 467–475.
Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008). A general boosting method and its application to learning ranking functions for web search. In Advances in Neural Information Processing Systems, vol. 20, pp. 1697–1704.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.