Heart disease identification from patients’ social posts, machine learning solution on Spark

Future Generation Computer Systems - Tập 111 - Trang 714-722 - 2020
Hager Saleh1, Eman M. G. Younis1, Abdeltawab Hendawi2,3, Abdelmgeid A. Ali4
1Information Systems Department, Faculty of Computers and Information, Minia University, Egypt
2Department of Computer Science and Statistics, University of Rhode Island, USA
3Faculty of Computers and Artificial Intelligence, Cairo University, Egypt
4Computer Science Department, Faculty of Computers and Information, Minia University, Egypt

Tóm tắt

Từ khóa


Tài liệu tham khảo

W.H. Organization, World Health Organization, http://www.who.int/cardiovascular_diseases/en, 2019.

A.H. Association, et al. Heart disease and stroke statistics-2003 update, http://www.americanheart.org/downloadable/heart/10590179711482003HDSStatsBookREV7-03.pdf, 2002.

Desai, 2019, Back-propagation neural network versus logistic regression in heart disease classification, 133

Burse, 2019, Various preprocessing methods for neural network based heart disease prediction, 55

Enriko, 2016, Heart disease prediction system using k-nearest neighbor algorithm with simplified patient’s health parameters, J. Telecommun. Electron. Comput. Eng. (JTEC), 8, 59

Chau, 2014, Support vector machine classification for large datasets using decision tree and fisher linear discriminant, Future Gener. Comput. Syst., 36, 57, 10.1016/j.future.2013.06.021

Nguyen, 2015, Classification of healthcare data using genetic fuzzy logic system and wavelets, Expert Syst. Appl., 42, 2184, 10.1016/j.eswa.2014.10.027

Maji, 2019, Decision tree algorithms for prediction of heart disease, 447

Sahoo, 2018, Sla based healthcare big data analysis and computing in cloud network, J. Parallel Distrib. Comput., 119, 121, 10.1016/j.jpdc.2018.04.006

Thanigaivasan, 2018, Analysis of parallel svm based classification technique on healthcare using big data management in cloud storage, Recent Pat. Comput. Sci., 11, 169, 10.2174/2213275911666180830145249

Wang, 2018, An integrated big data analytics-enabled transformation model: Application to health care, Inf. Manage., 55, 64, 10.1016/j.im.2017.04.001

Kumar, 2018, Cloud and iot based disease prediction and diagnosis system for healthcare using fuzzy neural classifier, Future Gener. Comput. Syst., 86, 527, 10.1016/j.future.2018.04.036

Nair, 2018, Applying spark based machine learning model on streaming big data for health status prediction, Comput. Electr. Eng., 65, 393, 10.1016/j.compeleceng.2017.03.009

Harris, 2013, Peer reviewed: Local health department use of twitter to disseminate diabetes information, united states, Prev. Chronic Dis., 10, 10.5888/pcd10.120215

Bates, 1995, Incidence of adverse drug events and potential adverse drug events: implications for prevention, JAMA, 274, 29, 10.1001/jama.1995.03530010043033

Thackeray, 2013, Using twitter for breast cancer prevention: an analysis of breast cancer awareness month, BMC Cancer, 13, 508, 10.1186/1471-2407-13-508

M.J. Paul, M. Dredze, You are what you tweet: Analyzing twitter for public health, in: Fifth International AAAI Conference on Weblogs and Social Media, 2011.

Lee, 2017, Stock market analysis from twitter and news based on streaming big data infrastructure, 312

Sakaki, 2010, Earthquake shakes twitter users: real-time event detection by social sensors, 851

A. Spark, Apache spark, https://spark.apache.org/, 2019.

A. Kafka, Apache kafka, https://spark.apache.org/, 2019.

Nazari, 2018, A fuzzy inference-fuzzy analytic hierarchy process-based clinical decision support system for diagnosis of heart diseases, Expert Syst. Appl., 95, 261, 10.1016/j.eswa.2017.11.001

Manogaran, 2018, Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neuro-fuzzy inference system, Multimedia Tools Appl., 77, 4379, 10.1007/s11042-017-5515-y

Gokulnath, 2018, An optimized feature selection based on genetic approach and support vector machine for heart disease, Cluster Comput., 1

Jayaraman, 2019, Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification, J. Ambient Intell. Hum. Comput., 1

Solaimani, 2014, Spark-based anomaly detection over multi-source vmware performance data in real-time, 1

. Cleveland, S. Hungary, The VA Long Beach, 2019. Heart disease data set. https://archive.ics.uci.edu/ml/datasets/heart+Disease.

T. App, Twitter streaming api, https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/connecting.html, 2019.

. scikit, Univariate feature selection, https://scikit-learn.org/stable/modules/feature_selection.html, 2019.

. scikit, Chi-squared statistic test, https://scikitlearn.org/stable/modules/generated/sklearn.feature_selection.chi2.html, 2019.

Haq, 2018, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms, Mob. Inf. Syst., 2018

V. Wan, W. Campbell, Support vector machines for speaker verification and identification, Vol. 2, 2000, pp. 775 – 784, http://dx.doi.org/10.1109/NNSP.2000.890157.

Han, 2011

Bei, 2018, Configuring in-memory cluster computing using random forest, Future Gener. Comput. Syst., 79, 1, 10.1016/j.future.2017.08.011

Pal, 2005, Random forest classifier for remote sensing classification, Int. J. Remote Sens., 26, 217, 10.1080/01431160412331269698