Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study

The Lancet Digital Health - Tập 5 - Trang e370-e379 - 2023
Amitava Banerjee1,2,3,4,5, Ashkan Dashtban1, Suliang Chen1, Laura Pasea1, Johan H Thygesen1, Ghazaleh Fatemifar1, Benoit Tyl6, Tomasz Dyszynski7, Folkert W Asselbergs1,2,5,8, Lars H Lund9,10, Tom Lumbers1,2,3,4,5, Spiros Denaxas1,2, Harry Hemingway1,2,5
1Institute of Health Informatics, University College London, London, UK
2Health Data Research UK, London, UK
3Barts Health NHS Trust, London, UK
4Department of Cardiology, University College London Hospitals NHS Trust, London, UK
5NIHR Biomedical Research Centre, University College London Hospitals NHS Trust, London, UK
6Medical Affairs, Pharmaceuticals, Bayer HealthCare, Paris, France
7Medical Affairs & Pharmacovigilance, Bayer AG, Berlin, Germany
8Amsterdam University Medical Centers, Department of Cardiology, University of Amsterdam, Amsterdam, Netherlands
9Division of Cardiology, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
10Heart and Vascular Theme, Karolinska University Hospital, Stockholm, Sweden

Tài liệu tham khảo

Ponikowski, 2016, Eur Heart J, 37, 2129, 10.1093/eurheartj/ehw128 Mordi, 2019, Differential association of genetic risk of coronary artery disease with development of heart failure with reduced versus preserved ejection fraction, Circulation, 139, 986, 10.1161/CIRCULATIONAHA.118.038602 Solomon, 2016, The future of clinical trials in cardiovascular medicine, Circulation, 133, 2662, 10.1161/CIRCULATIONAHA.115.020723 Seidelmann, 2018, Genetic variants in SGLT1, glucose tolerance, and cardiometabolic risk, J Am Coll Cardiol, 72, 1763, 10.1016/j.jacc.2018.07.061 Yancy, 2017, J Am Coll Cardiol, 70, 776, 10.1016/j.jacc.2017.04.025 Chawla, 2014, Proposal for a functional classification system of heart failure in patients with end-stage renal disease: proceedings of the acute dialysis quality initiative (ADQI) XI workgroup, J Am Coll Cardiol, 63, 1246, 10.1016/j.jacc.2014.01.020 Arnett, 2019, ACC/AHA guideline on the primary prevention of cardiovascular disease, Circulation, 2019 Banerjee, 2022, A population-based study of 92 clinically recognized risk factors for heart failure: co-occurrence, prognosis and preventive potential, Eur J Heart Fail, 24, 466, 10.1002/ejhf.2417 Ahmad, 2018, Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients, J Am Heart Assoc, 7, 10.1161/JAHA.117.008081 Banerjee, 2021, Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility, BMC Med, 19, 85, 10.1186/s12916-021-01940-7 Banerjee, 2020, Adherence and persistence to direct oral anticoagulants in atrial fibrillation: a population-based study, Heart, 106, 119, 10.1136/heartjnl-2019-315307 Denaxas, 2019, UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER, J Am Med Inform Assoc, 26, 1545, 10.1093/jamia/ocz105 Biobank Cai, 2012, An algorithm to identify medical practices common to both the General Practice Research Database and The Health Improvement Network database, Pharmacoepidemiol Drug Saf, 21, 770, 10.1002/pds.3277 Carbonari, 2015, Use of demographic and pharmacy data to identify patients included within both the Clinical Practice Research Datalink (CPRD) and The Health Improvement Network (THIN), Pharmacoepidemiol Drug Saf, 24, 999, 10.1002/pds.3844 Koudstaal, 2017, Prognostic burden of heart failure recorded in primary care, acute hospital admissions, or both: a population-based linked electronic health record cohort study in 2.1 million people, Eur J Heart Fail, 19, 1119, 10.1002/ejhf.709 Steele, 2018, Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease, PLoS One, 13, 10.1371/journal.pone.0202344 Josse, 2016, missMDA: a package for handling missing values in multivariate data analysis, J Stat Softw, 70, 1, 10.18637/jss.v070.i01 Saraswat, 2014, Feature selection and classification of leukocytes using random forest, Med Biol Eng Comput, 52, 1041, 10.1007/s11517-014-1200-8 Fujita, 2014, A non-parametric method to estimate the number of clusters, Comput Stat Data Anal, 73, 27, 10.1016/j.csda.2013.11.012 Gates, 2019, Element-centric clustering comparison unifies overlaps and hierarchy, Sci Rep, 9, 10.1038/s41598-019-44892-y Lambert, 2021, The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation, Nat Genet, 53, 420, 10.1038/s41588-021-00783-5 Shah, 2020, Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure, Nat Commun, 11, 163, 10.1038/s41467-019-13690-5 Bhambhani, 2018, Predictors and outcomes of heart failure with mid-range ejection fraction, Eur J Heart Fail, 20, 651, 10.1002/ejhf.1091 Santhanakrishnan, 2016, Atrial fibrillation begets heart failure and vice versa: temporal associations and differences in preserved versus reduced ejection fraction, Circulation, 133, 484, 10.1161/CIRCULATIONAHA.115.018614 Savji, 2018, The association of obesity and cardiometabolic traits with incident HFpEF and HFrEF, JACC Heart Fail, 6, 701, 10.1016/j.jchf.2018.05.018 Yoon, 2018, Personalized survival predictions via Trees of Predictors: an application to cardiac transplantation, PLoS One, 13, 10.1371/journal.pone.0194985 Cruz Rivera, 2020, Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension, Nat Med, 26, 1351, 10.1038/s41591-020-1037-7 Debray, 2023, Transparent reporting of multivariable prediction models developed or validated using clustered data: TRIPOD-Cluster checklist, BMJ, 380