A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

Journal of Biomedical Informatics - Tập 58 - Trang 49-59 - 2015
Miriam Seoane Santos1,2, Pedro Henriques Abreu1,2, Pedro J. García-Laencina3, Adélia Simão4, Armando Carvalho4
1Centre for Informatics and Systems, University of Coimbra, Pólo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
2Department of Informatics Engineering, Faculty of Sciences and Technology, University of Coimbra, Pólo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
3Centro Universitario de la Defensa de San Javier (University Centre of Defence at the Spanish Air Force Academy), MDE-UPCT, Calle Coronel López Peña, s/n, 30720 Santiago de la Ribera, Murcia, Spain
4Internal Medicine Service, Hospital and University Centre of Coimbra, EPE, Rua Fonseca Pinto, 3000-075 Coimbra, Portugal

Tóm tắt

Từ khóa


Tài liệu tham khảo

W.H. Organization, Globocan 2012: estimated cancer incidence, mortality and prevalence worldwide in 2012. <http://globocan.iarc.fr/>.

W.H. Organization, Cancer fact sheet, 2014. <http://www.who.int/mediacentre/factsheets/fs297>.

Anon., European association for the study of the liver, European organisation for research and treatment of cancer, EASL-EORTC clinical practice guidelines: management of hepatocellular carcinoma, J. Hepatol. 56 (4) (2012) 908–943.

Marinho, 2007, Rising costs and hospital admissions for hepatocellular carcinoma in portugal (1993–2005), World J. Gastroenterol., 13, 1522, 10.3748/wjg.v13.i10.1522

L.P.C. Cancro, Cancro do fígado pode aumentar 70 por cento até, 2015. <http://www.ligacontracancro.pt/noticias/detalhes.php?id=115>.

Burke, 1997, Artificial neural networks improve the accuracy of cancer survival prediction, Cancer, 79, 857, 10.1002/(SICI)1097-0142(19970215)79:4<857::AID-CNCR24>3.0.CO;2-Y

Thongkam, 2009, Toward breast cancer survivability prediction models through improving training space, Expert Syst. Appl., 36, 12200, 10.1016/j.eswa.2009.04.067

Esfandiari, 2014, Knowledge discovery in medicine: current issue and future trend, Expert Syst. Appl., 41, 4434, 10.1016/j.eswa.2014.01.011

Abreu, 2014, Overall survival prediction for women breast cancer using ensemble methods and incomplete clinical data, vol. 41, 1366

Abreu, 2014, Personalizing breast cancer patients with heterogeneous data, vol. 42, 39

Yuan, 1998, Neural-network design for small training sets of high dimension, IEEE Trans. Neural Netw., 9, 266, 10.1109/72.661122

Andonie, 2010, Extreme data mining: Interference from small datasets, Int. J. Comput. Commun. Control, 5, 280, 10.15837/ijccc.2010.3.2481

Harrell, 1996, Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors, Stat. Med., 15, 361, 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4

García-Laencina, 2010, Pattern classification with missing data: a review, Neural Comput. Appl., 19, 263, 10.1007/s00521-009-0295-6

Qi, 2013, On an ensemble algorithm for clustering cancer patient data, BMC Syst. Biol., 7, S9, 10.1186/1752-0509-7-S4-S9

Chawla, 2002, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321, 10.1613/jair.953

Forner, 2012, Hepatocellular carcinoma, Lancet, 379, 1245, 10.1016/S0140-6736(11)61347-0

Durand, 2005, Assessment of the prognosis of cirrhosis: childpugh versus meld, J. Hepatol., 42, S100, 10.1016/j.jhep.2004.11.015

Cruz, 2006, Applications of machine learning in cancer prediction and prognosis, Cancer Informat., 2, 59, 10.1177/117693510600200030

Wasyluk, 2010, Founding of database for cirrhotic patients for early detection of hepatocellular carcinoma, Hepatology, 6, 13

Ho, 2012, Disease-free survival after hepatic resection in hepatocellular carcinoma patients: a prediction approach using artificial neural network, PLoS ONE, 7, e29179, 10.1371/journal.pone.0029179

H.C. Chiu, T.W. Ho, L.K. T., H.Y. Chen, W.H. Ho, Mortality predicted accuracy for hepatocellular carcinoma patients with hepatic resection using artificial neural network, Sci. World J. 2013 (2013) 201976–10.

Shi, 2012, Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery, PLoS ONE, 7, e35781, 10.1371/journal.pone.0035781

Little, 2002

Cismondi, 2013, Missing data in medical databases: impute, delete or classify?, Artif. Intell. Med., 58, 63, 10.1016/j.artmed.2013.01.003

García-Laencina, 2009, K nearest neighbours with mutual information for simultaneous classification and missing data imputation, Neurocomputing, 72, 1483, 10.1016/j.neucom.2008.11.026

García-Laencina, 2013, Classifying patterns with missing values using multi-task learning perceptrons, Expert Syst. Appl., 40, 1333, 10.1016/j.eswa.2012.08.057

García-Laencina, 2015, Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values, Comput. Biol. Med., 59, 125, 10.1016/j.compbiomed.2015.02.006

Little, 1999, Methods for handling missing values in clinical trials, J. Rheumatol., 26, 1654

Troyanskaya, 2001, Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520, 10.1093/bioinformatics/17.6.520

Jerez, 2010, Missing data imputation using statistical and machine learning methods in a real breast cancer problem, Artif. Intell. Med., 50, 105, 10.1016/j.artmed.2010.05.002

Batista, 2003, An analysis of four missing data treatment methods for supervised learning, Appl. Artif. Intell., 17, 519, 10.1080/713827181

Suarez-Alvarez, 2012, Statistical approach to normalization of feature vectors and clustering of mixed datasets, Proc. Roy. Soc. London A: Math. Phys. Eng. Sci., 468, 2630, 10.1098/rspa.2011.0704

Tibshirani, 2001, Estimating the number of clusters in a data set via the gap statistic, J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.), 63, 411, 10.1111/1467-9868.00293

Jain, 2010, Data clustering: 50 years beyond k-means, Pattern Recogn. Lett., 31, 651, 10.1016/j.patrec.2009.09.011

Chauhan, 2010, Data clustering method for discovering clusters in spatial cancer databases, Int. J. Comput. Appl., 10, 9

Winkler, 2013, An integrated clustering and classification approach for the analysis of tumor patient data, vol. 8111, 388

He, 2009, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 21, 1263, 10.1109/TKDE.2008.239

Bishop, 2006

D. Arthur, S. Vassilvitskii, K-means++: the advantages of careful seeding, in: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, 2007, pp. 1027–1035.

Dudoit, 2003, Bagging to improve the accuracy of a clustering procedure, Bioinformatics, 19, 1090, 10.1093/bioinformatics/btg038

Vega-Pons, 2011, A survey of clustering ensembles, Int. J. Pattern Recogn. Artif. Intell., 25, 337, 10.1142/S0218001411008683

Yang, 2014, Exploring the diversity in cluster ensemble generation: random sampling and random projection, Expert Syst. Appl., 41, 4844, 10.1016/j.eswa.2014.01.028

Yu, 2014, Probabilistic cluster structure ensemble, Inform. Sci., 267, 16, 10.1016/j.ins.2014.01.030

de Vries, 1986, Stratified random sampling, 31

Huang, 2005, Using auc and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng., 17, 290

Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, J. Machine Learning Res., 7, 1