Supervised Learning Models for the Preliminary Detection of COVID-19 in Patients Using Demographic and Epidemiological Parameters

Information (Switzerland) - Tập 13 Số 7 - Trang 330
Aditya Pradhan1, Srikanth Prabhu1, Krishnaraj Chadaga1, Saptarshi Sengupta2, Gopal Nath3
1Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576014, India
2Department of Computer Science, San Jose State university, 1 Washington SQ, San Jose, CA 95192, USA
3Department of Mathematics and Statistics, Murray State University, Murray, KY 42071, USA

Tóm tắt

The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of people and has put a tremendous strain on already-scarce healthcare systems globally, particularly in underdeveloped countries. Over 11 billion vaccine doses have already been administered worldwide, and the benefits of these vaccinations will take some time to appear. Today, the only practical approach to diagnosing COVID-19 is through the RT-PCR and RAT tests, which have sometimes been known to give unreliable results. Timely diagnosis and implementation of precautionary measures will likely improve the survival outcome and decrease the fatality rates. In this study, we propose an innovative way to predict COVID-19 with the help of alternative non-clinical methods such as supervised machine learning models to identify the patients at risk based on their characteristic parameters and underlying comorbidities. Medical records of patients from Mexico admitted between 23 January 2020 and 26 March 2022, were chosen for this purpose. Among several supervised machine learning approaches tested, the XGBoost model achieved the best results with an accuracy of 92%. It is an easy, non-invasive, inexpensive, instant and accurate way of forecasting those at risk of contracting the virus. However, it is pretty early to deduce that this method can be used as an alternative in the clinical diagnosis of coronavirus cases.

Từ khóa


Tài liệu tham khảo

Woo, 2010, Coronavirus genomics and bioinformatics analysis, Viruses, 2, 1804, 10.3390/v2081803

Hayden, F., Richman, D., and Whitley, R. (2017). Clinical Virology, ASM Press. [4th ed.].

Huang, 2020, Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China, Lancet, 395, 497, 10.1016/S0140-6736(20)30183-5

Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (2020). The species Severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol., 5, 536–544.

Yuki, 2020, COVID-19 pathophysiology: A review, Clin. Immunol., 215, 108427, 10.1016/j.clim.2020.108427

Liu, 2020, Review-Clinical features of COVID-19 in elderly patients: A comparison with young and middle-aged patients, J. Infect., 80, e14, 10.1016/j.jinf.2020.03.005

Singh, 2020, Diabetes in COVID-19: Prevalence, pathophysiology, prognosis and practical considerations, Diabetes Metab. Syndr., 14, 303, 10.1016/j.dsx.2020.04.004

Zhang, 2020, Risk factors for disease severity, unimprovement, and mortality in COVID-19 patients in Wuhan, China, Clin. Microbiol. Infect., 26, 767, 10.1016/j.cmi.2020.04.012

Lu, 2020, Outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle, J. Med. Virol., 92, 401, 10.1002/jmv.25678

(2022, June 01). Johns Hopkins Coronavirus Resource Center. Available online: https://coronavirus.jhu.edu/.

Lei, 2020, Clinical characteristics and outcomes of patients undergoing surgeries during the incubation period of COVID-19 infection, EClinicalMedicine, 21, 100331, 10.1016/j.eclinm.2020.100331

Li, 2020, Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia, N. Engl. J. Med., 382, 1199, 10.1056/NEJMoa2001316

Habibzadeh, 2021, Molecular diagnostic assays for COVID-19: An overview, Crit. Rev. Clin. Lab. Sci., 58, 385, 10.1080/10408363.2021.1884640

Mahendiratta, 2020, Molecular diagnosis of COVID-19 in different biologic matrix, their diagnostic validity and clinical relevance: A systematic review, Life Sci., 258, 118207, 10.1016/j.lfs.2020.118207

Goudouris, 2021, Laboratory diagnosis of COVID-19, J. Pediatr., 97, 7, 10.1016/j.jped.2020.08.001

Zhu, 2020, PCR past, present and future, BioTechniques, 69, 317, 10.2144/btn-2020-0057

Falzone, 2021, Current and innovative methods for the diagnosis of COVID-19 infection (Review), Int. J. Mol. Med., 47, 100, 10.3892/ijmm.2021.4933

Yang, 2020, Laboratory Diagnosis and Monitoring the Viral Shedding of SARS-CoV-2 Infection, Innovation, 1, 100061

Kucirka, 2020, Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction–Based SARS-CoV-2 Tests by Time Since Exposure, Ann. Intern. Med., 173, 262, 10.7326/M20-1495

Burog, 2020, Should IgM/IgG rapid test kit be used in the diagnosis of COVID-19?, Acta Med. Philipp., 54, 1, 10.47895/amp.v54i0.1558

Yu, 2018, Artificial intelligence in healthcare, Nat. Biomed. Eng., 2, 719, 10.1038/s41551-018-0305-z

Rustam, 2020, COVID-19 Future Forecasting Using Supervised Machine Learning Models, IEEE Access, 8, 101489, 10.1109/ACCESS.2020.2997311

Kotsiantis, 2007, Supervised Machine Learning: A Review of Classification Techniques, Emerg. Artif. Intell. Appl. Comput. Eng., 160, 3

Quinlan, R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers.

Liu, D., Clemente, L., Poirier, C., Ding, X., Chinazzi, M., Davis, J.T., Vespignani, A., and Santillana, M. (2020). A machine learning methodology for real-time forecasting of the 2019–2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models. arXiv.

Saravanan, R., and Sujatha, P. (2018, January 14–15). A state of art techniques on machine learning algorithms: A perspective of supervised learning approaches in data classification. Proceedings of the IEEE 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India.

Kaelbling, 1996, Reinforcement Learning: A Survey, J. Artif. Intell. Res., 4, 237, 10.1613/jair.301

Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press. [2nd ed.].

LeCun, 2015, Deep learning, Nature, 521, 436, 10.1038/nature14539

Young, 2018, Recent Trends in Deep Learning Based Natural Language Processing, IEEE Comput. Intell. Mag., 13, 55, 10.1109/MCI.2018.2840738

Pak, M.S., and Kim, S.H. (2017, January 8–10). A review of deep learning in image recognition. Proceedings of the International Conference on Computer Applications and Information Processing Technology, Kuta Bali, Indonesia.

Shokeen, 2019, An Application-oriented Review of Deep Learning in Recommender Systems, Int. J. Intell. Syst. Appl., 11, 46

Lee, W., Seong, J.J., Ozlu, B., Shim, B.S., Marakhimov, A., and Lee, S. (2021). Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review. Sensors, 21.

Chadaga, 2021, Battling COVID-19 using machine learning: A review, Cogent Eng., 8, 1958666, 10.1080/23311916.2021.1958666

Zou, 2018, Predicting diabetes mellitus with machine learning techniques, Front. Genet., 9, 515, 10.3389/fgene.2018.00515

Ergen, 2020, A Deep Feature Learning Model for Pneumonia Detection Applying a Combination of mRMR Feature Selection and Machine Learning Models, IRBM, 41, 212, 10.1016/j.irbm.2019.10.006

Kourou, 2015, Machine learning applications in cancer prognosis and prediction, Comput. Struct. Biotechnol. J., 13, 8, 10.1016/j.csbj.2014.11.005

Pellegrini, 2018, Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: A systematic review, Alzheimer Dement. Diagn. Assess. Dis. Monit., 10, 519

Bind, 2015, A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction, Int. J. Comput. Sci. Inf. Technol., 6, 1648

Musunuri, 2021, Acute-on-Chronic Liver Failure Mortality Prediction using an Artificial Neural Network, Eng. Sci., 15, 187

Lalmuanawma, 2020, Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: A review, Chaossolitons Fractals, 139, 110059, 10.1016/j.chaos.2020.110059

Zu, 2020, Coronavirus Disease 2019 (COVID-19): A Perspective from China, Radiology, 296, E15, 10.1148/radiol.2020200490

Lee, 2020, COVID-19 pneumonia: What has CT taught us?, Lancet Infect. Dis., 20, 384, 10.1016/S1473-3099(20)30134-1

Narin, 2021, Automatic Detection of Coronavirus Disease (COVID-19) Using X-ray Images and Deep Convolutional Neural Networks, Pattern Anal. Appl., 24, 1207, 10.1007/s10044-021-00984-y

Ozturk, 2020, Automated detection of COVID-19 cases using deep neural networks with X-ray images, Comput. Biol. Med., 121, 103792, 10.1016/j.compbiomed.2020.103792

Yu, 2022, An Image Quality–informed Framework for CT Characterization, Radiology, 302, 380, 10.1148/radiol.2021210591

Muhammad, 2020, Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset, SN Comput. Sci., 2, 11, 10.1007/s42979-020-00394-7

Franklin, M.R. (2020, June 26). Mexico COVID-19 Clinical Data. Available online: https://www.kaggle.com/marianarfranklin/mexico-covid19-clinical-data/metadata.

Quiroz-Juárez, M.A., Torres-Gómez, A., Hoyo-Ulloa, I., León-Montiel, R.D.J., and U’Ren, A.B. (2021). Identification of high-risk COVID-19 patients using machine learning. PLoS ONE, 16.

Prieto, K. (2022). Current forecast of COVID-19 in Mexico: A Bayesian and machine learning approaches. PLoS ONE, 17.

Iwendi, 2022, COVID-19 health analysis and prediction using machine learning algorithms for Mexico and Brazil patients, J. Exp. Theor. Artif. Intell., 1, 1

Martinez-Velazquez, R., Tobon, V.D.P., Sanchez, A., El Saddik, A., and Petriu, E. (2021). A Machine Learning Approach as an Aid for Early COVID-19 Detection. Sensors, 21.

Rezapour, M., and Varady, C.A. (2021). A machine learning analysis of the relationship between some underlying medical conditions and COVID-19 susceptibility. arXiv.

Maouche, 2021, Early Prediction of ICU Admission Within COVID-19 Patients Using Machine Learning Techniques, Innovations in Smart Cities Applications, Volume 5, 507

Delgado-Gallegos, J.L., Avilés-Rodriguez, G., Padilla-Rivas, G.R., Cosio-León, M.D.l.Á., Franco-Villareal, H., Zuñiga-Violante, E., Romo-Cardenas, G.S., and Islas, J.F. (2020). Clinical applications of machine learning on COVID-19: The use of a decision tree algorithm for the assessement of perceived stress in mexican healthcare professionals. medRxiv.

Yadav, A. (2021, January 6). Predicting Covid-19 using Random Forest Machine Learning Algorithm. Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Khargpur, India.

Mukherjee, R., Kundu, A., Mukherjee, I., Gupta, D., Tiwari, P., Khanna, A., and Shorfuzzaman, M. (2021). IoT-cloud based healthcare model for COVID-19 detection: An enhanced k-Nearest Neighbour classifier based approach. Computing, 1–21.

Chaudhary, 2021, Community detection using unsupervised machine learning techniques on COVID-19 dataset, Soc. Netw. Anal. Min., 11, 28, 10.1007/s13278-021-00734-2

Cornelius, E., Akman, O., and Hrozencik, D. (2021). COVID-19 Mortality Prediction Using Machine Learning-Integrated Random Forest Algorithm under Varying Patient Frailty. Mathematics, 9.

Cassandras, 2020, Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: Hospitalizations, mortality, and the need for and ICU or ventilator, Int. J. Med. Inform., 123, 11

Durden, B., Shulman, M., Reynolds, A., Phillips, T., Moore, D., Andrews, I., and Pouriyeh, S. (2021, January 5–8). Using Machine Learning Techniques to Predict RT-PCR Results for COVID-19 Patients. Proceedings of the 2021 IEEE Symposium on Computers and Communications (ISCC), Athens, Greece.

Guzmán-Torres, J.A., Alonso-Guzmán, E.M., Domínguez-Mota, F.J., and Tinoco-Guerrero, G. (2021). Estimation of the Main Conditions in (SARS-CoV-2) COVID-19 Patients That Increase the Risk of Death Using Machine Learning, the Case of Mexico, Elsevier.

Chadaga, 2021, COVID-19 Mortality Prediction among Patients Using Epidemiological Parameters: An Ensemble Machine Learning Approach, Eng. Sci., 16, 221

Chadaga, 2022, Clinical and laboratory approach to diagnose COVID-19 using machine learning, Interdiscip. Sci. Comput. Life Sci., 14, 452, 10.1007/s12539-021-00499-4

Almansoor, M., and Hewahi, N.M. (2020, January 26–27). Exploring the Relation between Blood Tests and COVID-19 Using Machine Learning. Proceedings of the 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), Sakheer, Bahrain.

(2022, March 26). Open Data General Directorate of Epidemiology. Available online: https://www.gob.mx/salud/documentos/datos-abiertos-152127.

Ahlgren, 2003, Requirements for a cocitation similarity measure, with special reference to pearson’s correlation coefficient, J. Am. Soc. Inf. Sci. Technol., 54, 550, 10.1002/asi.10242

Devillanova, 2012, Min-max solutions to some scalar field equations, Adv. Nonlinear Stud., 12, 173, 10.1515/ans-2012-0110

Thara, 2019, Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques, Pattern Recognit. Lett., 128, 544, 10.1016/j.patrec.2019.10.029

Nick, 2007, Logistic regression, Methods Mol. Biol., 404, 273, 10.1007/978-1-59745-530-5_14

Belgiu, 2016, Random Forest in remote sensing: A review of applications and future directions, ISPRS J. Photogramm. Remote Sens., 114, 24, 10.1016/j.isprsjprs.2016.01.011

Chen, T., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, Association for Computing Machinery.

Schapire, R.E. (2013). Explaining adaboost. Empirical Inference, Springer.

Zhang, 2007, ML-KNN: A lazy learning approach to multi-label learning, Pattern Recognit., 40, 2038, 10.1016/j.patcog.2006.12.019

Krogh, 2008, What are Artificial Neural Networks?, Nat. Biotechnol., 26, 195, 10.1038/nbt1386

Chawla, 2002, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321, 10.1613/jair.953

Han, 2005, Borderline-smote: A new over-sampling method in imbalanced data sets learning, Adv. Intell. Comput., 3644, 878

Parsa, 2019, Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis, Accid. Anal. Prev., 136, 105405, 10.1016/j.aap.2019.105405

Visani, 2020, Statistical stability indices for LIME: Obtaining reliable explanations for machine learning models, J. Oper. Res. Soc., 73, 91, 10.1080/01605682.2020.1865846

Hatwell, J., Gaber, M.M., and Azad, R.M.A. (2020). Ada-WHIPS: Explaining AdaBoost classification with applications in the health sciences. BMC Med. Inform. Decis. Mak., 20.

Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv.

Dhanabal, 2011, A review of various K-nearest neighbor query processing techniques, Int. J. Comput. Appl. Technol., 31, 14