A hybrid model for class noise detection using k-means and classification filtering algorithms

Zahra Nematzadeh1, Roliana Ibrahim1, Ali Selamat2
1School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia (UTM), Johor Baharu, Malaysia
2School of Computing, Faculty of Engineering, UTM and Media and Games Center of Excellence (MagicX), Universiti Teknologi Malaysia, 81310, Johor Bahru, Johor, Malaysia

Tóm tắt

Từ khóa


Tài liệu tham khảo

Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 223:177–210

Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 255:845–869

Miranda AL, Garcia LPF, Carvalho AC, Lorena AC (2009) Use of classification algorithms in noise detection and elimination. In: International conference on hybrid artificial intelligence systems. Springer, pp 417–424

Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 1601:120–131

Lowongtrakool C, Hiransakolwong N (2012) Noise filtering in unsupervised clustering using computation intelligence. Int J Math Anal 659:2911–2920

Srimani PPK, Koti MS (2012) Outlier mining in medical databases by using statistical methods. Int J Eng Sci Technol 401:239–246

Catal C, Alan O, Balkan K (2011) Class noise detection based on software metrics and ROC curves. Inf Sci 18121:4867–4877

Sluban B, Gamberger D, Lavra N (2010) Advances in class noise detection. Front Artif Intell Appl 2151:1105–1106

Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 222:85–126

Van Hulse JD, Khoshgoftaar TM, Huang H (2006) The pairwise attribute noise detection algorithm. Knowl Inf Syst 112:171–190

Xiong H, Pandey G, Member S (2006) Enhancing data analysis with noise removal. IEEE Trans Knowl Data Eng 183:304–319

Zeidat N, Wang S, Eick CF (2005) Dataset editing techniques: a comparative study. University of Houston, Houston

Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 952:225–256

Thongkam J, Xu G, Zhang Y, Huang F (2008) Support vector machine for outlier detection in breast cancer survivability prediction. In: Advanced web and network technologies, and applications. Springer, pp 99–109

Jeatrakul P, Wong KW, Fung CC (2010) Data cleaning for classification using misclassification analysis. J Adv Comput Intell Intell Inform 143:297–302

Angelova A, Abu-Mostafa Y, Perona P (2005) Pruning training sets for learning of object categories. In: IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, pp 494–501

Segata N, Blanzieri E, Delany SJ, Cunningham P (2010) Noise reduction for instance-based learning with a local maximal margin approach. J Intell Inf Syst 352:301–331

Segata N, Blanzieri E (2009) A scalable noise reduction technique for large case-based systems. In: International conference on case-based reasoning. Springer, Berlin, pp 328–342

Zeng X, Martinez T (2003) A noise filtering method using neural networks. In: IEEE international workshop on soft computing techniques in instrumentation, measurement and related applications, 2003, SCIMA 2003, pp 26–31

Sánchez JS, Barandela R, Marqués AI et al (2003) Analysis of new techniques to obtain quality training sets. Pattern Recogn Lett 247:1015–1022

Sabzevari M, Martínez-Muñoz G, Suárez A (2018) A two-stage ensemble method for the detection of class-label noise. Neurocomputing 275:2374–2383

Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95–112

He Z, Yu C (2019) Clustering stability-based evolutionary k-means. Soft Comput 231:305–321

Nematzadeh Z, Ibrahim R, Selamat A (2015) A method for class noise detection based on k-means and SVM algorithms. In: Intelligent software methodologies, tools and techniques. Springer, pp 308–318

Singh K, Malik D, Sharma N (2011) Evolving limitations in k-means algorithm in data mining and their removal. Int J Comput Eng Manag 121:105–109

Garcia LPF, Lorena AC, Carvalho ACPLF (2012) A study on class noise detection and elimination. In: 2012 Brazilian symposium on neural networks. Curitiba- PR. 20–25 Oct, pp 13–18

Farid DM, Harbi N, Rahman MZ (2010) Combining Naive Bayes and decision tree for adaptive intrusion detection. arXiv preprint arXiv:1005.4496

Meyer D (2004) Support vector machines: the interface to libsvm in package, p e1071

Li D-f, Hu W-c, Xiong W, Yang J-b (2008) Fuzzy relevance vector machine for learning from unbalanced data and noise. Pattern Recogn Lett 299:1175–1181

Wald R, Khoshgoftaar TM, Shanab AA (2014) The effect of noise level and distribution on classification of easy gene microarray data. In: Proceedings of the 2014 IEEE 15th international conference on information reuse and integration, pp 297–302

Dehariya S, Singh D (2013) An ensemble method based on particle of swarm for the reduction of noise, outlier and core point. Int J Adv Comput Res 31:1–5

Depeursinge A, Iavindrasana J, Hidki A et al (2010) Comparative performance analysis of state-of-the-art classification algorithms applied to lung tissue categorization. J Digit Imaging 231:18–30

Folleco A, Khoshgoftaar TM, Hulse JV, Bullard, L (2008) Software quality modeling: the impact of class noise on the random forest classifier. In: 2008 IEEE congress on evolutionary computation (IEEE world congress on computational intelligence). IEEE, pp 3853–3859

Van Hulse J, Khoshgoftaar T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 6812:1513–1542

Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 131:21–27

Daza L, Acuna E (2007) An algorithm for detecting noise on supervised classification. In: Proceedings of WCECS-07, the 1st world conference on engineering and computer science, pp 701–706

Pechenizkiy M, Tsymbal A, Puuronen S et al (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: 19th IEEE symposium on computer-based medical systems (CBMS’06), pp 708–713

Lan M, Tan CL, Su J, Lu Y (2009) Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans Pattern Anal Mach Intell 314:721–735

Li Y (2003) Classification in the presence of class noise. Pattern Recogn 5:1–30

Li R-L, Hu Y-F (2003) Noise reduction to text categorization based on density for KNN. In: Proceedings of the 2003 international conference on machine learning and cybernetics (IEEE Cat. No. 03EX693), vol 5. IEEE, pp 3119–3124

Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 251:845–869

Folorunsho O (2013) Comparative study of different data mining techniques performance in knowledge discovery from medical database. Int J Adv Res Comput Sci Softw Eng 33:11–15

Kordos M, Rusiecki A (2013) Improving MLP neural network performance by noise reduction. In: International conference on theory and practice of natural computing. Springer, Berlin, pp 133–144

Webb AR (2003) Statistical pattern recognition. Wiley, New York

Juang L-H, Wu M-N (2010) MRI brain lesion image detection based on color-converted k-means clustering segmentation. Measurement 437:941–949

Frank A, Asuncion A (2011) UCI machine learning repository, 2010. http://archive.ics.uci.edu/ml

Smith MR, Martinez T (2013) An extensive evaluation of filtering misclassified instances in supervised classification tasks, vol 11, pp 1312–3970. arXiv preprint arXiv:1312.3970

Nematzadeh Z, Ibrahim R, Selamat A, Nazerian V (2020) The synergistic combination of fuzzy C-means and ensemble filtering for class noise detection. Eng Comput 377:2337–2355