Dự đoán người sử dụng dịch vụ y tế chi phí cao trong số những người mắc bệnh tim mạch sử dụng học máy và cơ sở dữ liệu hành chính xã hội liên kết toàn quốc

Springer Science and Business Media LLC - Tập 13 - Trang 1-13 - 2023
Nhung Nghiem1, June Atkinson1, Binh P. Nguyen2, An Tran-Duy3, Nick Wilson1
1Department of Public Health, University of Otago, Wellington, New Zealand
2School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
3Centre for Health Policy, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia

Tóm tắt

Để tối ưu hóa việc lập kế hoạch dịch vụ y tế công cộng, cần xem xét tác động của những người sử dụng dịch vụ y tế có chi phí cao. Tuy nhiên, hầu hết các mô hình thống kê hiện có về chi phí không bao gồm nhiều biến lâm sàng và xã hội từ dữ liệu hành chính có liên quan đến việc sử dụng tài nguyên y tế gia tăng, và ngày càng trở nên sẵn có. Nghiên cứu này nhằm sử dụng các phương pháp học máy và dữ liệu lớn để dự đoán những người sử dụng dịch vụ y tế chi phí cao trong số những người mắc bệnh tim mạch (CVD). Chúng tôi đã sử dụng các tập dữ liệu liên kết đại diện cho cả nước ở New Zealand để dự đoán các trường hợp bệnh CVD phổ biến nhất với chi phí cao nhất thuộc về các quintile hàng đầu theo chi phí. Chúng tôi so sánh hiệu suất của bốn mô hình học máy phổ biến (hồi quy logistic có điều chỉnh L1, cây phân loại, k-láng giềng gần nhất (KNN) và rừng ngẫu nhiên) với các mô hình hồi quy truyền thống. Các mô hình học máy có độ chính xác cao hơn nhiều trong việc dự đoán những người sử dụng dịch vụ y tế chi phí cao so với các mô hình logistic. Điểm hài hòa F1 (kết hợp độ nhạy và giá trị dự đoán dương) của các mô hình học máy dao động từ 30.6% đến 41.2% (so với 8.6–9.1% cho các mô hình logistic). Các chi phí y tế trước đó, thu nhập, độ tuổi, tình trạng sức khỏe mãn tính, tình trạng thiếu thốn, và việc nhận trợ cấp xã hội là một trong những yếu tố dự đoán quan trọng nhất đối với những người sử dụng dịch vụ y tế chi phí cao trong bệnh CVD. Nghiên cứu này cung cấp thêm chứng cứ rằng học máy có thể được sử dụng như một công cụ cùng với dữ liệu lớn trong kinh tế y tế để xác định các yếu tố rủi ro mới và dự đoán những người sử dụng dịch vụ y tế chi phí cao mắc bệnh CVD. Do đó, học máy có thể hỗ trợ lập kế hoạch dịch vụ y tế và các biện pháp phòng ngừa nhằm cải thiện sức khỏe cộng đồng đồng thời tiết kiệm chi phí chăm sóc sức khỏe.

Từ khóa

#học máy #bệnh tim mạch #dự đoán chi phí y tế #dữ liệu lớn #người sử dụng dịch vụ sức khỏe chi phí cao

Tài liệu tham khảo

Lee NS, Whitman N, Vakharia N, Rothberg MB. High-cost patients: Hot-spotters don’t explain the half of it. J Gen Intern Med. 2017;32(1):28–34. Zook CJ, Moore FD. High-cost users of medical care. N Engl J Med. 1980;302(18):996–1002. Calver J, Brameld KJ, Preen DB, Alexia SJ, Boldy DP, McCaul KA. High-cost users of hospital beds in Western Australia: a population-based record linkage study. Med J Aust. 2006;184(8):393–7. Reardon PM, Fernando SM, Van Katwyk S, Thavorn K, Kobewka D, Tanuseputro P, et al. Characteristics, outcomes, and cost patterns of high-cost patients in the intensive care unit. Crit Care Res Prac. 2018;2018:5452683. Vu M, Carvalho N, Clarke PM, Buchbinder R, Tran-Duy A. Impact of Comorbid Conditions on Healthcare Expenditure and Work-related Outcomes in Patients With Rheumatoid Arthritis. J Rheumatol. 2021;48(8):1221–9. Weymann D, Smolina K, Gladstone EJ, Morgan SG. High-Cost Users of Prescription Drugs: A Population-Based Analysis from British Columbia. Canada Health Services Research. 2017;52(2):697–719. Hensel JM, Taylor VH, Fung K, de Oliveira C, Vigod SN. Unique characteristics of high-cost users of medical care with comorbid mental illness or addiction in a population-based cohort. Psychosomatics. 2018;59(2):135–43. de Oliveira C, Cheng J, Rehm J, Kurdyak P. The role of mental health and addiction among high-cost patients: a population-based study. J Med Econ. 2018;21(4):348–55. Alberga A, Holder L, Kornas K, Bornbaum C, Rosella L. Effects of behavioural risk factors on high-cost users of healthcare: a population-based study. Can J Public Health. 2018;109(4):441–50. Goel V, Rosella LC, Fu L, Alberga A. The relationship between life satisfaction and healthcare utilization: a longitudinal study. Am J Prev Med. 2018;55(2):142–50. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123–31. Lesyuk W, Kriza C, Kolominsky-Rabas P. Cost-of-illness studies in heart failure: a systematic review 2004–2016. BMC Cardiovasc Disord. 2018;18(1):74. Ryder S, Fox K, Rane P, Armstrong N, Wei C-Y, Deshpande S, et al. A systematic review of direct cardiovascular event costs: an international perspective. PharmacoEconomics. 2019:1–25. Tarride J-E, Lim M, DesMeules M, Luo W, Burke N, O’Reilly D, et al. A review of the cost of cardiovascular disease. Can J Cardiol. 2009;25(6):e195–202. Wang H, Naghavi M, Allen C, Barber RM, Bhutta ZA, Carter A, et al. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. The Lancet. 2016;388(10053):1459–544. Roth GA, Mensah GA, Johnson CO, Addolorato G, Ammirati E, Baddour LM, et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019: Update From the GBD 2019 Study. J Am Coll Cardiol. 2020;76(25):2982–3021. Wang G, Grosse SD, Schooley MW. Conducting research on the economics of hypertension to improve cardiovascular health. Am J Prev Med. 2017;53(6):S115–7. Mullainathan S, Spiess J. Machine learning: an applied econometric approach. Journal of Economic Perspectives. 2017;31(2):87–106. Schilling C, Mortimer D, Dalziel K, Heeley E, Chalmers J, Clarke P. Using Classification and Regression Trees (CART) to Identify Prescribing Thresholds for Cardiovascular Disease. Pharmacoeconomics. 2016;34(2):195–205. Varian HR. Big data: New tricks for econometrics. Journal of Economic Perspectives. 2014;28(2):3–28. Onukwugha E. Big Data and Its Role in Health Economics and Outcomes Research: A Collection of Perspectives on Data Sources, Measurement, and Analysis. Pharmacoeconomics. 2016;34(2):91–3. Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the Power of Artificial Intelligence with the Richness of Healthcare Claims Data: Opportunities and Challenges. Pharmacoeconomics. 2019;37(6):745–52. Kreif N, Grieve R, Díaz I, Harrison D. Evaluation of the effect of a continuous treatment: a machine learning approach with an application to treatment for traumatic brain injury. Health Econ. 2015;24(9):1213–28. Blakely T, Lynch J, Simons K, Bentley R, Rose S. Reflection on modern methods: when worlds collide—prediction, machine learning and causal inference. Int J Epidemiol. 2020;49(6):2058–64. Rose S, Bergquist SL, Layton TJ. Computational health economics for identification of unprofitable health care enrollees. Biostatistics. 2017;18(4):682–94. Bergquist SL, Layton TJ, McGuire TG, Rose S. Data transformations to improve the performance of health plan payment methods. J Health Econ. 2019;66:195–207. Ministry of Health. Health loss in New Zealand: A report from the New Zealand Burden of Diseases, Injuries and Risk Factors Study, 2006–2016. Wellington: Ministry of Health; 2013. Murray CJL, Aravkin AY, Zheng P, Abbafati C, Abbas KM, Abbasi-Kangevari M, et al. Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet. 2020;396(10258):1223–49. Blakely T, Kvizhinadze G, Atkinson J, Dieleman J, Clarke P. Health system costs for individual and comorbid noncommunicable diseases: An analysis of publicly funded health events from New Zealand. PLoS Med. 2019;16(1): e1002716. Atkinson J, Salmond C, Crampton P. NZDep2013 index of deprivation. Wellington: Department of Public Health, University of Otago; 2014. Stats NZ. Integrated Data Infrastructure, URL: https://www.stats.govt.nz/integrated-data/integrated-data-infrastructure. [Accessed 7 May 2022]. Thornley S, Wright C, Marshall R, Jackson G, Drury P, Wells S, et al. Can the prevalence of diagnosed diabetes be estimated from linked national health records? The validity of a method applied in New Zealand. J Prim Health Care. 2011;3(4):262–8. Ministry of Health. IDI Data Dictionary: Chronic condition/significant health event cohort (November 2015 edition). Available from www.stats.govt.nz. 2015 [Accessed 7 May 2022]. Statistics NZ. IDI Population Explorer. Available from https://github.com/StatisticsNZ/population-explorer. 2017 [Accessed 7 May 2022]. Refaeilzadeh P, Tang L, Liu H. Cross-Validation. In: Liu L, ÖZsu MT, editors. Encyclopedia of Database Systems. Boston, MA: Springer US; 2009. p. 532–8. Camacho X, Nedkoff L, Wright FL, Nghiem N, Buajitti E, Goldacre R, et al. Relative contribution of trends in myocardial infarction event rates and case fatality to declines in mortality: an international comparative study of 1·95 million events in 80·4 million people in four countries. The Lancet Public Health. 2022;7(3):e229–39. Ministry of Health. Health Loss in New Zealand 1990–2013. 2016. Kreatsoulas C, Subramanian S. Machine learning in social epidemiology: learning from experience. SSM-population health. 2018;4:347. Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39:95–112. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–8. Goto T, Camargo CA, Faridi MK, Yun BJ, Hasegawa K. Machine learning approaches for predicting disposition of asthma and COPD exacerbations in the ED. Am J Emerg Med. 2018;36(9):1650–4. Shi J, Yin W, Osher S, Sajda P. A fast hybrid algorithm for large-scale l1-regularized logistic regression. The Journal of Machine Learning Research. 2010;11:713–41. Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, Sontag D. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data. 2015;3(4):277–87. Mehta S, Jackson R, Pylypchuk R, Poppe K, Wells S, Kerr AJ. Development and validation of alternative cardiovascular risk prediction equations for population health planning: a routine health data linkage study of 1.7 million New Zealanders. Int J Epidemiol. 2018;47(5):1571–84. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. 2020 [Accessed 7 May 2022]. Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics. 2009;10(1):213. Miner-Williams W. Racial inequities in cardiovascular disease in New Zealand. Diversity and Equality in Health and Care. 2017;14(1):23–33. Saeed W, Omlin C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl-Based Syst. 2023;263: 110273. Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy (Basel, Switzerland). 2020;23(1):18. Chan M-C, Pai K-C, Su S-A, Wang M-S, Wu C-L, Chao W-C. Explainable machine learning to predict long-term mortality in critically ill ventilated patients: a retrospective study in central Taiwan. BMC Med Inform Decis Mak. 2022;22(1):75. Anand S, Bradshaw C, Prabhakaran D. Prevention and management of CVD in LMICs: why do ethnicity, culture, and context matter? BMC Med. 2020;18(1):7. Athey S, Imbens GW. The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives. 2017;31(2):3–32. Athey S, Imbens GW. Machine learning methods that economists should know about. Ann Rev Econ. 2019;11:685–725. McGuire TG, Zink AL, Rose S. Simplifying and Improving the Performance of Risk Adjustment Systems. National Bureau of Economic Research; 2020. Report No.: 0898–2937. Pylypchuk R, Wells S, Kerr A, Poppe K, Harwood M, Mehta S, et al. Cardiovascular risk prediction in type 2 diabetes before and after widespread screening: a derivation and validation study. Lancet. 2021;397(10291):2264–74. Corbett-Davies S, Goel S. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:180800023. 2018. Benthall S, Haynes BD, editors. Racial categories in machine learning. Proceedings of the conference on fairness, accountability, and transparency; 2019. Briggs, A.H., Healing the past, reimagining the present, investing in the future: What should be the role of race as a proxy covariate in health economics informed health care policy? Health Economics, 2022: p. 1–5. https://doi.org/10.1002/hec.4577. Farkas L. Data collection in the field of ethnicity. Luxembourg: European Commission; 2017. Report No.: ISBN 978–92–79–66084–9. de Carvalho LSF, Gioppato S, Fernandez MD, Trindade BC, Silva JCQe, Miranda RGS, et al. Machine Learning Improves the Identification of Individuals With Higher Morbidity and Avoidable Health Costs After Acute Coronary Syndromes. Value in Health. 2020;23(12):1570–9. Little MA, Varoquaux G, Saeb S, Lonini L, Jayaraman A, Mohr DC, et al. Using and understanding cross-validation strategies. Perspectives on Saeb et al. GigaScience. 2017;6(5). Tabe-Bordbar S, Emad A, Zhao SD, Sinha S. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Sci Rep. 2018;8(1):6620. Blakely T, Cleghorn C, Mizdrak A, Waterlander W, Nghiem N, Swinburn B, et al. The effect of food taxes and subsidies on population health and health costs: a modelling study. The Lancet Public Health. 2020;5(7):e404–13. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. Barbieri S, Mehta S, Wu B, Bharat C, Poppe K, Jorm L, et al. Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach. Int J Epidemiol. 2022;51(3):931–44. https://doi.org/10.1093/ije/dyab258.