Detecting diseases in medical prescriptions using data mining methods

Sana Nazari Nezhad1, Mohammad Hadi Zahedi1, Elham Farahani2
1Department of Industrial Engineering, K. N. Toosi University of Technology, Tehran, Iran
2Sharif University of Technology, Tehran, Iran

Tóm tắt

AbstractEvery year, the health of millions of people around the world is compromised by misdiagnosis, which sometimes could even lead to death. In addition, it entails huge financial costs for patients, insurance companies, and governments. Furthermore, many physicians’ professional life is adversely affected by unintended errors in prescribing medication or misdiagnosing a disease. Our aim in this paper is to use data mining methods to find knowledge in a dataset of medical prescriptions that can be effective in improving the diagnostic process. In this study, using 4 single classification algorithms including decision tree, random forest, simple Bayes, and K-nearest neighbors, the disease and its category were predicted. Then, in order to improve the performance of these algorithms, we used an Ensemble Learning methodology to present our proposed model. In the final step, a number of experiments were performed to compare the performance of different data mining techniques. The final model proposed in this study has an accuracy and kappa score of 62.86% and 0.620 for disease prediction and 74.39% and 0.720 for prediction of the disease category, respectively, which has better performance than other studies in this field.In general, the results of this study can be used to help maintain the health of patients, and prevent the wastage of the financial resources of patients, insurance companies, and governments. In addition, it can aid physicians and help their careers by providing timely information on diagnostic errors. Finally, these results can be used as a basis for future research in this field.

Từ khóa


Tài liệu tham khảo

Balogh EP, Miller BT, Ball JR. Improving diagnosis in health care. Washington, DC: National Academies Press (US); 2015.

Ahmad P, Qamar S, Rizvi SQA. Techniques of data mining in healthcare: a review. Int J Comput Appl. 2015;120(15):38–50.

Subanya B, Rajalaxmi R. Feature selection using Artificial Bee Colony for cardiovascular disease classification. 2014 International Conference on Electronics and Communication Systems (ICECS). Coimbatore: IEEE; 2014. p. 1–6.

GHazanfari M, Alizadeh S, Teimourpour B. Data mining knowledge discovery. Tehran: Iran University of Science and Technology; 2014.

Rodziewicz TL, Houseman B, Hipskind JE. Medical Error Reduction and Prevention; 2022. Available from: https://www.ncbi.nlm.nih.gov/books/NBK499956/.

Van Den Bos J, Rustagi K, Gray T, Halford M, Ziemkiewicz E, Shreve J. The $17.1 billion problem: the annual cost of measurable medical errors. Health Aff. 2011;30(4):596–603.

Schmier JK, Hulme-Lowe CK, Semenova S, Klenk JA, DeLeo PC, Sedlak R, et al. Estimated hospital costs associated with preventable health care-associated infections if health care antiseptic products were unavailable. ClinicoEconomics Outcomes Res. 2016;8:197.

Esfandiari N, Babavalian MR, Moghadam AME, Tabar VK. Knowledge discovery in medicine: current issue and future trend. Expert Syst Appl. 2014;41(9):4434–63.

Kondababu A, Siddhartha V, Kumar BB, Penumutchi B. A comparative study on machine learning based heart disease prediction. In: Materials Today: Proceedings; 2021.

Jeyaranjani J, Rajkumar TD, Kumar TA. Coronary heart disease diagnosis using the efficient ANN model. In: Materials Today: Proceedings; 2021.

Jothi KA, Subburam S, Umadevi V, Hemavathy K. Heart disease prediction system using machine learning. In: Materials Today: Proceedings; 2021.

Pavithra V, Jayalakshmi V. Hybrid feature selection technique for prediction of cardiovascular diseases. In: Materials Today: Proceedings; 2021.

Ramesh G, Madhavi K, Reddy PDK, Somasekar J, Tan J. Improving the accuracy of heart attack risk prediction based on information gain feature selection technique. In: Materials Today: Proceedings; 2021.

Maini E, Venkateswarlu B, Maini B, Marwaha D. Machine learning–based heart disease prediction system for Indian population: an exploratory study done in South India. Med J Armed Forces India. 2021;77(3):302–11.

Kumar S, Sahoo G. Classification of heart disease using naive bayes and genetic algorithm. In: Computational intelligence in data mining-volume 2: Springer; 2015. p. 269–82.

Jain B, Ranawat N, Chittora P, Chakrabarti P, Poddar S. A machine learning perspective: to analyze diabetes. In: Materials Today: Proceedings; 2021.

Kumari S, Kumar D, Mittal M. An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cogn Comput Eng. 2021;2:40–6.

Khaleel FA, Al-Bakry AM. Diagnosis of diabetes using machine learning algorithms. In: Materials Today: Proceedings; 2021.

Arumugam K, Naved M, Shinde PP, Leiva-Chauca O, Huaman-Osorio A, Gonzales-Yanac T. Multiple disease prediction using machine learning algorithms. In: Materials Today: Proceedings; 2021.

Wei X, Lu Q, Jin S, Li F, Zhao Q, Cui Y, et al. Developing and validating a prediction model for lymphedema detection in breast cancer survivors. Eur J Oncol Nurs. 2021;54:102023.

Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ. F-test feature selection in stacking ensemble model for breast cancer prediction. Procedia Comput Sci. 2020;171:1561–70.

Onan A. A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl. 2015;42(20):6844–52.

Ferdowsy F, Rahi KSA, Jabiullah MI, Habib MT. A machine learning approach for obesity risk prediction. Curr Res Behav Sci. 2021;2:100053.

Pinto A, Ferreira D, Neto C, Abelha A, Machado J. Data mining to predict early stage chronic kidney disease. Procedia Comput Sci. 2020;177:562–7.

Ahsani-Estahbanati E, Doshmangir L, Najafi B, Akbari Sari A, Sergeevich GV. Incidence rate and financial burden of medical errors and policy interventions to address them: a multi-method study protocol. Health Serv Outcomes Res Methodol. 2022;22(2):244–52.

Malladi R, Vempaty P, Pogaku V. Advanced machine learning based approach for prediction of skin cancer. In: Materials Today: Proceedings; 2021.

Dehkordi SK, Sajedi H. Prediction of disease based on prescription using data mining methods. Heal Technol. 2019;9(1):37–44.

Teimouri M, Farzadfar F, Alamdari MS, Hashemi-Meshkini A, Alamdari PA, Rezaei-Darzi E, et al. Detecting diseases in medical prescriptions using data mining tools and combining techniques. Iran J Pharm Res. 2016;15(Suppl):113.

Trasierras AM, Luna JM, Ventura S. Improving the understanding of cancer in a descriptive way: an emerging pattern mining-based approach. Int J Intell Syst. 2022;37(4):2822–48.

Frias M, Moyano JM, Rivero-Juarez A, Luna JM, Camacho Á, Fardoun HM, et al. Classification accuracy of hepatitis C virus infection outcome: data mining approach. J Med Internet Res. 2021;23(2):e18766.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. 3rd ed: The Morgan Kaufmann Series in Data Management Systems; 2011.

Sulzmann JN, F¨urnkranz J. Rule stacking: an approach for compressing an ensemble of rule sets into a single classifier. In: International conference on discovery science. Heidelberg: Springer; 2011. p. 323–34.

Kantardzic M. Data mining: concepts, models, methods, and algorithms. 3rd ed. Hoboken: Wiley-IEEE Press; 2020.