Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

Computers in Biology and Medicine - Tập 59 - Trang 125-133 - 2015
Pedro J. García-Laencina1, Pedro Henriques Abreu2,3, Miguel Henriques Abreu4, Noémia Afonoso4
1Centro Universitario de la Defensa de San Javier (University Centre of Defence at the Spanish Air Force Academy), MDE-UPCT, Calle Coronel Lopez Peña, s/n, 30720 Santiago de la Ribera, Murcia, Spain
2Centre for Informatics and Systems, University of Coimbra, Pólo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
3Department of Informatics Engineering, Faculty of Sciences and Technology, University of Coimbra, Pólo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
4Portuguese Institute of Oncology of Porto, Rua Dr. Antonio Bernardino de Almeida, 4200-072 Porto, Portugal

Tóm tắt

Từ khóa


Tài liệu tham khảo

Siegel, 2014, Cancer statistics, 2014, Cancer J. Clin., 64, 9, 10.3322/caac.21208

Clark, 1989, Prediction of relapse or survival in patients with node-negative breast cancer by dna flow cytometry, New Engl. J. Med., 320, 627, 10.1056/NEJM198903093201003

Delen, 2005, Predicting breast cancer survivability, Artif. Intell. Med., 34, 113, 10.1016/j.artmed.2004.07.002

H. Miao, M. Hartman, N. Bhoo-Pathy, S.-C. Lee, N.A. Taib, E.-Y. Tan, P. Chan, K.G.M. Moons, H.S. Wong, J. Goh, S.M. Rahim, C.H. Yip, H.M. Verkooijen, Predicting survival of de novo metastatic breast cancer in asian women: systematic review and validation study, PLoS One 9 (4) (2014). http://dx.doi:10.1371/journal.pone.0093755

Burton, 2004, Missing covariate data within cancer prognostic studies, Br. J. Cancer, 91, 4, 10.1038/sj.bjc.6601907

Jerez, 2010, Missing data imputation using statistical and machine learning methods in a real breast cancer problem, Artif. Intell. Med., 50, 105, 10.1016/j.artmed.2010.05.002

Abreu, 2014, Overall survival prediction for women breast cancer using ensemble methods and incomplete clinical data, vol. 41, 1366

Abreu, 2014, Personalizing breast cancer patients with heterogeneous data, vol. 42, 39

2010

Little, 1999, Methods for handling missing values in clinical trials, J. Rheumatol., 26, 1654

P.D. Allison, Missing Data, Sage University Papers Series on Quantitative Applications in the Social Sciences, Thousand Oaks, CA, USA, 2001.

García-Laencina, 2010, Pattern classification with missing data, Neural Comput. Appl., 19, 263, 10.1007/s00521-009-0295-6

Little, 2002

Cismondi, 2013, Missing data in medical databases, Artif. Intell. Med., 58, 63, 10.1016/j.artmed.2013.01.003

Cruz, 2006, Applications of machine learning in cancer prediction and prognosis, Cancer Informatics, 59

Polat, 2007, Breast cancer diagnosis using least square support vector machine, Digit. Signal Process., 17, 694, 10.1016/j.dsp.2006.10.008

Sahan, 2007, A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis, Comput. Biol. Med., 37, 415, 10.1016/j.compbiomed.2006.05.003

Daemen, 2012, Improved modeling of clinical data with kernel methods, Artif. Intell. Med., 54, 103, 10.1016/j.artmed.2011.11.001

Wang, 2014, A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients, Appl. Soft Comput., 20, 15, 10.1016/j.asoc.2013.09.014

Dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B, 39, 1, 10.1111/j.2517-6161.1977.tb01600.x

Z. Ghahramani, M.I. Jordan, Supervised learning from incomplete data via an EM approach, in: J.D. Cowan, G. Tesauro, J. Alspector (Eds.), Advances in Neural Information Processing Systems, vol. 6, Denver, CO, 1993, pp. 120–127.

Bishop, 2006

Zio, 2007, Imputation through finite gaussian mixture models, Comput. Stat. Data Anal., 51, 5305, 10.1016/j.csda.2006.10.002

Train, 2008, EM algorithms for nonparametric estimation of mixing distributions, J. Choice Model., 1, 40, 10.1016/S1755-5345(13)70022-8

Aha, 1997

Mitchell, 1997

Batista, 2003, An analysis of four missing data treatment methods for supervised learning, Appl. Artif. Intell., 17, 519, 10.1080/713827181

Troyanskaya, 2001, Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520, 10.1093/bioinformatics/17.6.520

García-Laencina, 2009, K nearest neighbours with mutual information for simultaneous classification and missing data imputation, Neurocomputing, 72, 1483, 10.1016/j.neucom.2008.11.026

Dudani, 1976, The distance-weighted k-nearest-neighbor rule, IEEE Trans. Syst. Man Cybern., 6, 325, 10.1109/TSMC.1976.5408784

Quinlan, 1993

Kantardzic, 2011

R. Latkowski, High computational complexity of the decision tree induction with many missing attribute values, in: Proceedings of Concurrency, Specification and Programming. CS&P, vol. 22, 2003, pp. 318–325.

Vapnik, 1998

Cristianini, 2000

Scholkopf, 2001

Suykens, 1999, Least squares support vector machine classifiers, Neural Process. Lett., 9, 293, 10.1023/A:1018628609742

Park, 2013, Robust predictive model for evaluating breast cancer survivability, Eng. Appl. Artif. Intell., 26, 2194, 10.1016/j.engappai.2013.06.013

Prentice, 1978, Regression analysis of grouped survival data with application to breast cancer data, Biometrics, 34, 57, 10.2307/2529588

Burke, 1997, Artificial neural networks improve the accuracy of cancer survival prediction, Cancer, 79, 857, 10.1002/(SICI)1097-0142(19970215)79:4<857::AID-CNCR24>3.0.CO;2-Y

Markey, 2006, Impact of missing data in evaluating artificial neural networks trained on complete data, Comput. Biol. Med., 36, 516, 10.1016/j.compbiomed.2005.02.001

Dorri, 2012, Missing value imputation in DNA microarrays based on conjugate gradient method, Comput. Biol. Med., 42, 222, 10.1016/j.compbiomed.2011.11.011

Abawajy, 2013, Predicting cardiac autonomic neuropathy category for diabetic data with missing values, Comput. Biol. Med., 43, 1328, 10.1016/j.compbiomed.2013.07.002

Herring, 2004, Non-ignorable missing covariate data in survival analysis, J. R. Stat. Soc. Ser. C (Appl. Stat.), 53, 293, 10.1046/j.1467-9876.2003.05168.x

J.M. Jerez, I. Molina, J.L. Subirats, L. Franco, Missing data imputation in breast cancer prognosis, in: BioMed׳06: Proceedings of the 24th IASTED International Conference on Biomedical Engineering, 2006, pp. 323–328.