Classification with label noise: a Markov chain sampling framework

Data Mining and Knowledge Discovery - Tập 33 - Trang 1468-1504 - 2018
Zijin Zhao1, Lingyang Chu1, Dacheng Tao2, Jian Pei1
1School of Computing Science, Simon Fraser University, Burnaby, Canada
2The Faculty of Engineering and Information Technologies, The UBTECH Sydney Artificial Intelligence Centre and the School of Information Technologies, The University of Sydney, Darlington, Australia

Tóm tắt

The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.

Tài liệu tham khảo

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370 Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156 Belur V (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Washington DC Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. Asian Conf Mach Learn 20:97–112 Breiman L (2001) Random forests. Mach Learn 45(1):5–32 Cawley G, Talbot N (2006) http://theoval.cmp.uea.ac.uk/matlab Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27 Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27 Delany SJ, Cunningham P (2004) An analysis of case-base editing in a spam filtering system. Eur Conf Case-Based Reason 3115:128–141 Folleco A, Khoshgoftaar TM, Van Hulse J, Bullard L (2008) Identifying learners robust to low quality data. In: International conference on information reuse and integration, pp 190–195 Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869 Gamberger D, Lavrač N (1996) Noise detection and elimination applied to noise handling in a KRK chess endgame. In: International workshop on inductive logic programming, pp 72–88 Gamberger D, Lavrač N, Džeroski S (1996) Noise elimination in inductive concept learning: a case study in medical diagnosis. Int Workshop Algorithm Learn Theory 1160:199–212 Gamberger D, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: International conference on machine learning, pp 143–151 Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107 Gilks WR, Richardson S, Spiegelhalter D (1995) Markov Chain Monte Carlo in practice. CRC Press, Boca Raton Grimmett G, Stirzaker D (2001) Probability and random processes. Oxford University Press, Oxford Guyon I, Matic N, Vapnik V, et al (1994) Discovering informative patterns and data cleaning. In: International conference on knowledge discovery and data mining, pp 145–156 Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley, Hoboken John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Uncertainty in artificial intelligence, pp 338–345 Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp 1953–1961 Khoshgoftaar TM, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: International conference on information reuse and integration, pp 369–375 Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. Int Conf Mach Learn 1:306–313 Li H (2015) Theoretical analysis and efficient algorithms for crowdsourcing. University of California, Berkeley Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systems, pp 692–700 Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461 Long PM, Servedio RA (2008) Random classification noise defeats all convex potential boosters. In: International conference on machine learning, pp 608–615 Manwani N, Sastry P (2013) Noise tolerance under risk minimization. IEEE Trans Cybern 43(3):1146–1151 Miranda AL, Garcia LPF, Carvalho AC, Lorena AC (2009) Use of classification algorithms in noise detection and elimination. Int Conf Hybrid Artif Intell Syst 5572:417–424 Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204 Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306 Patrini G, Nielsen F, Nock R, Carioni M (2016) Loss factorization, weakly supervised learning and label noise robustness. In: International conference on machine learning, pp 708–717 Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11(Apr):1297–1322 Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge Scott C, Blanchard G, Handy G (2013) Classification with asymmetric label noise: consistency and maximal denoising. Conf Learn Theory 30:489–511 Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: International conference on knowledge discovery and data mining, pp 614–622 Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, pp 884–893 Sun JW, Zhao Wang CI, Chen SF (2007) Identifying and correcting mislabeled training instances. Future Gener Commun Netw 1:244–250 Thongkam J, Xu G, Zhang Y, Huang F (2008) Support vector machine for outlier detection in breast cancer survivability prediction. Asia-Pac Web Conf 4977:99–109 Valiant LG (1984) A theory of the learnable. ACM Commun 27(11):1134–1142 Vapnik V (2013) The nature of statistical learning theory. Springer, New York Vaughan JW (2018) Making better use of the crowd: how crowdsourcing can advance machine learning research. J Mach Learn Res 18(1):7026–7071 Whitehill J, Wu Tf, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043 Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421 Wilson DR, Martinez TR (1997) Instance pruning techniques. Int Conf Mach Learn 97:403–411 Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286 Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62 Yang T, Mahdavi M, Jin R, Zhang L, Zhou Y (2012) Multiple kernel learning from noisy labels by stochastic programming. In: International conference on machine learning, pp 1–8 Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems, pp 2195–2203