Predicting factors for survival of breast cancer patients using machine learning techniques

Mogana Darshini Ganggayah1, Nur Aishah Mohd Taib2, Yip Cheng Har2, Píetro Lió3, Sarinder Kaur Dhillon1
1Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia
2Department of Surgery, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
3Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, England

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ponnuraja CC, Lakshmanan B, Srinivasan V, Prasanth BK. Decision Tree Classification and Model Evaluation for Breast Cancer Survivability: A Data Mining Approach. Biomed Pharmacol J. 2017;10:281–9.

Malehi AS. Diagnostic classification scheme in Iranian breast cancer patients using a decision tree. Asian Pac J Cancer Prev. 2014;15:5593–6.

Shrivastava SS, Sant A, Aharwal RP. An overview on data mining approach on breast Cancer data. Int J Adv Comput Res. 2013;3(4):256–62.

Islam T, Bhoo-Pathy N, Su TT, Majid HA, Nahar AM, Ng CG, et al. The Malaysian breast Cancer survivorship cohort (MyBCC): a study protocol. BMJ Open Br Med J Publ Group. 2015;5:e008643.

Taib NA, Akmal M, Mohamed I, Yip C-H. Improvement in survival of breast cancer patients - trends over two time periods in a single institution in an Asia Pacific country, Malaysia. Asian Pac J Cancer Prev. 2011;12:345–9.

Leong SPL, Shen ZZ, Liu TJ, Agarwal G, Tajima T, Paik NS, et al. Is breast Cancer the same disease in Asian and Western countries? World J Surg. 2010;34:2308–24.

Bhoo-Pathy N, Verkooijen HM, Tan E-Y, Miao H, Taib NAM, Brand JS, et al. Trends in presentation, management and survival of patients with de novo metastatic breast cancer in a southeast Asian setting. Sci Rep. 2015;5:16252.

Yip CH, Bhoo Pathy N, Uiterwaal CS, Taib NA, Tan GH, Mun KS, et al. Factors affecting estrogen receptor status in a multiracial Asian country: an analysis of 3557 cases. Breast. 2011;20:S60–4.

Ng CH, Pathy NB, Taib NA, Ho GF, Mun KS, Rhodes A, et al. Do clinical features and survival of single hormone receptor positive breast cancers differ from double hormone receptor positive breast cancers? Asian Pac J Cancer Prev. 2014;15:7959–64.

Pearce CB, Gunn R, Ahmed A, Johnson D. Machine learning can improve prediction of severity in acute pancreatitis using admission values of APACHE II score and C-reactive protein. Pancreatology. 2006;6:123–31.

Eftekhar B, Mohammad K, Ardebili HE, Ghodsi M, Ketabchi E. Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data. BMC Med Inform Decis Mak. 2005;5(1):3.

Verplancke T, Van Looy S, Benoit D, Vansteelandt S, Depuydt P, Decruyenaere J, et al. Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignancies. BMC Med Inform Decis Mak. 2008;8(1):56.

Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak. 2011;11(1):51.

Son CS, Jang BK, Seo ST, Kim MS, Kim YN. A hybrid decision support model to discover informative knowledge in diagnosing acute appendicitis. BMC Med Inform Decis Mak. 2012;12(1):17.

Melillo P, Orrico A, Attanasio M, Rossi S, Pecchia L, Chirico F, et al. A pilot study for development of a novel tool for clinical decision making to identify fallers among ophthalmic patients. BMC Med Inform Decis Mak. 2015;15(3):S6.

Chen Y, Cao W, Gao X, Ong H, Ji T. Predicting postoperative complications of head and neck squamous cell carcinoma in elderly patients using random forest algorithm model. BMC Med Inform Decis Mak. 2015;15:44.

Wei J, Wang J, Zhu Y, Sun J, Xu H, Li M. Traditional Chinese medicine pharmacovigilance in signal detection : decision tree-based data classification. BMC Med Inform Decis Mak. 2018;18(1):19.

Huber M, Kurz C. Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning. BMC Med Inform Decis Mak. 2019;19(1):3.

Sudhamathy G, Thilagu M, Padmavathi G. Comparative analysis of R package classifiers using breast cancer dataset. Int J Eng Technol. 2016;8:2127–36.

Chen W, Xie X, Wang J, Pradhan B, Hong H, Tien D, et al. A comparative study of logistic model tree , random forest , and classi fi cation and regression tree models for spatial prediction of landslide susceptibility. Catena. 2017;151:147–60.

Muchlinski D, Siroky D, Kocher M. Comparing random Forest with logistic regression for predicting class-imbalanced civil war onset data. Polit Anal. 2016;24(1):87–103.

Dong Y, Du B, Zhang L, Member S. Target detection based on random Forest metric learning. IEEE J Sel Top Appl Earth Obs Remote Sens. 2015;8(4):1830–8.

Mosca E, Alfieri R, Merelli I, Viti F, Calabria A, Milanesi L. A multilevel data integration resource for breast cancer study. BMC Syst Biol. 2010;4(1):76.

Genuer R, Poggi J-M, Tuleau-Malot C. VSURF: an R package for variable selection using random forests. R J. 2015;7(2):19–33.

Amato F, Lopez A, Pena-mendez EM, Vanhara P, Hampl A. Artificial neural networks in medical diagnosis. J Appl Biomed. 2013;11(2):47–58.

Atkins SIRH, Hayward JL, Klugman DJ, Wayte AB. Treatment of early breast Cancer : a report after ten years of a clinical trial. Br Med J. 1972;2(5811):423–9.

Pilaftsis A, Rubio J. The Higgs Machine Learning Challenge. Journal of Physics: Conference Series. 2015;664(7):072015.

Erener A, Mutlu A, Düzgün HS. A comparative study for landslide susceptibility mapping using GIS-based multi-criteria decision analysis (MCDA), logistic regression (LR) and association rule mining (ARM). Eng Geol. 2016;203:45–55.

Decruyenaere A, Decruyenaere P, Peeters P, Vermassen F, Dhaene T. Prediction of delayed graft function after kidney transplantation: comparison between logistic regression and machine learning methods. BMC Med Inform Decis Mak. 2015;15(1):83.

Sacchet MD, Prasad G, Foland-ross LC, Thompson PM, Gotlib IH. Support vector machine classification of major depressive disorder using diffusion-weighted neuroimaging and graph theory. Front Psych. 2015;6:21.

Huynh-thu VA, Saeys Y, Wehenkel L, Geurts P. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics. 2012;28(13):1766–74.

Katz MH. Academia and clinic multivariable analysis : a primer for readers of medical research. Ann Intern Med. 2013;138(8):644–50.

Wickham H, Grolemund G. R for data science: import, tidy, transform, visualize, and model data. 1st ed. Sebastopol: O'Reilly Media, Inc; 2017.

R Core Team (2018). R: a language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria. URL https://www.R-project.org/ .

James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: with applications in R. 1st ed. London: Springer; 2017.

Pedregosa F, Varoquaux G, Gramfort A, Michael V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

Pijnappel EN, Bhoo-Pathy N, Suniza J, See MH, Tan GH, Yip CH, et al. Prediction of lymph node involvement in patients with breast tumors measuring 3-5 cm in a middle-income setting: the role of cancermath. World J Surg. 2014;38(12):3133–7.

Hefti MM, Hu R, Knoblauch NW, Collins LC, Haibe-Kains B, Tamimi RM, et al. Estrogen receptor negative/progesterone receptor positive breast cancer is not a reproducible subtype. Breast Cancer Res. 2013;15(4):R68.

Therneau T, Atkinson B. rpart: Recursive Partitioning and Regression Trees. R version 4.1–13; 2018.

Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.

Fritsch S, Guenther F. neuralnet: Training of Neural Networks. R package version 1.33; 2016.

Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. xgboost: Extreme Gradient Boosting. In: R package version 0.71.2; 2018.

Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.6–8; 2017.

Thai B, Tien D, Prakash I, Dholakia MB. Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena. 2017;149:52–63.

Mustapha IB, Saeed F. Bioactive molecule prediction using extreme gradient boosting. Molecules. 2016;21(8):983.

Lebedev AV, Westman E, Van Westen GJP, Kramberger MG, Lundervold A, Aarsland D, et al. Random Forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. NeuroImage Clin. 2014;6:115–25.

Genuer R, Poggi JM, Malot CT. VSURF: Variable Selection Using Random Forests. In: R package version 1.0.4; 2018.

Paluszynska A, Biecek P. randomForestExplainer: Explaining and Visualizing Random Forests in Terms of Variable Importance. In: R package version 0.9; 2017.

Therneau T, Grambsch PM. A Package for Survival Analysis in S. R package version 2.38; 2015.

Ture M, Tokatli F, Kurt I. Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients. Expert Syst Appl. 2009;36(2):2017–26.

Ibrahim NA, Kudus A, Daud I, Bakar MRA. Decision tree for competing risks survival probability in breast cancer study. Int J Biomed Sci. 2008;3(1):25–9.

Sarvestani AS, Safavi AA, Parandeh NM, Salehi M. Predicting breast cancer survivability using data mining techniques. Software technology and Engineering (ICSTE), 2nd international Conference. In: IEEE; 2010. p. 2.

Tham TM, Iyengar KR, Taib NA, Yip CH. Fine needle aspiration biopsy, core needle biopsy or excision biopsy to diagnose breast cancer - which is the ideal method? Asian Pacific J Cancer Prev. 2009;10:155–8.

Sanghani M, Balk EM, Cady B. Impact of axillary lymph node dissection on breast Cancer outcome in clinically node negative patients. Cancer. 2009;115(8):1613–20.

Krag DN, Single RM. Breast cancer survival according to number of nodes removed. Ann Surg Oncol. 2003;10:1152–9.

Orr RK. The impact of prophylactic axillary node dissection on breast Cancer survival - a Bayesian meta-analysis. Ann Surg Oncol. 1999;6(1):109–16.

Schwartz AM, Henson DE, Chen D, Rajamarthandan S. Histologic grade remains a prognostic factor for breast cancer regardless of the number of positive lymph nodes and tumor size: a study of 161 708 cases of breast cancer from the SEER program. Arch Pathol Lab Med. 2014;138(8):1048–52.

Rosenberg J, Chia YL, Plevritis S. The effect of age, race, tumor size, tumor grade, and disease stage on invasive ductal breast cancer survival in the U.S. SEER database. Breast Cancer Res Treat. 2005;89:47–54.

Miao H, Hartman M, Bhoo-Pathy N, Lee S-C, Taib NA, Tan E-Y, et al. Predicting survival of De novo metastatic breast Cancer in Asian women: systematic review and validation study. PLoS One. 2014;9(4):e93755.

Wishart GC, Azzato EM, Greenberg DC, Rashbass J, Kearins O, Lawrence G, et al. PREDICT : a new UK prognostic model that predicts survival following surgery for invasive breast cancer. BMC Breast Cancer Res. 2010;12:401.