Classification with label noise: a Markov chain sampling framework
Tóm tắt
The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.
Tài liệu tham khảo
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370
Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156
Belur V (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Washington DC
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. Asian Conf Mach Learn 20:97–112
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cawley G, Talbot N (2006) http://theoval.cmp.uea.ac.uk/matlab
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Delany SJ, Cunningham P (2004) An analysis of case-base editing in a spam filtering system. Eur Conf Case-Based Reason 3115:128–141
Folleco A, Khoshgoftaar TM, Van Hulse J, Bullard L (2008) Identifying learners robust to low quality data. In: International conference on information reuse and integration, pp 190–195
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Gamberger D, Lavrač N (1996) Noise detection and elimination applied to noise handling in a KRK chess endgame. In: International workshop on inductive logic programming, pp 72–88
Gamberger D, Lavrač N, Džeroski S (1996) Noise elimination in inductive concept learning: a case study in medical diagnosis. Int Workshop Algorithm Learn Theory 1160:199–212
Gamberger D, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: International conference on machine learning, pp 143–151
Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107
Gilks WR, Richardson S, Spiegelhalter D (1995) Markov Chain Monte Carlo in practice. CRC Press, Boca Raton
Grimmett G, Stirzaker D (2001) Probability and random processes. Oxford University Press, Oxford
Guyon I, Matic N, Vapnik V, et al (1994) Discovering informative patterns and data cleaning. In: International conference on knowledge discovery and data mining, pp 145–156
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley, Hoboken
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Uncertainty in artificial intelligence, pp 338–345
Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp 1953–1961
Khoshgoftaar TM, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: International conference on information reuse and integration, pp 369–375
Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. Int Conf Mach Learn 1:306–313
Li H (2015) Theoretical analysis and efficient algorithms for crowdsourcing. University of California, Berkeley
Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systems, pp 692–700
Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Long PM, Servedio RA (2008) Random classification noise defeats all convex potential boosters. In: International conference on machine learning, pp 608–615
Manwani N, Sastry P (2013) Noise tolerance under risk minimization. IEEE Trans Cybern 43(3):1146–1151
Miranda AL, Garcia LPF, Carvalho AC, Lorena AC (2009) Use of classification algorithms in noise detection and elimination. Int Conf Hybrid Artif Intell Syst 5572:417–424
Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204
Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Patrini G, Nielsen F, Nock R, Carioni M (2016) Loss factorization, weakly supervised learning and label noise robustness. In: International conference on machine learning, pp 708–717
Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam
Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11(Apr):1297–1322
Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Scott C, Blanchard G, Handy G (2013) Classification with asymmetric label noise: consistency and maximal denoising. Conf Learn Theory 30:489–511
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: International conference on knowledge discovery and data mining, pp 614–622
Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, pp 884–893
Sun JW, Zhao Wang CI, Chen SF (2007) Identifying and correcting mislabeled training instances. Future Gener Commun Netw 1:244–250
Thongkam J, Xu G, Zhang Y, Huang F (2008) Support vector machine for outlier detection in breast cancer survivability prediction. Asia-Pac Web Conf 4977:99–109
Valiant LG (1984) A theory of the learnable. ACM Commun 27(11):1134–1142
Vapnik V (2013) The nature of statistical learning theory. Springer, New York
Vaughan JW (2018) Making better use of the crowd: how crowdsourcing can advance machine learning research. J Mach Learn Res 18(1):7026–7071
Whitehill J, Wu Tf, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Wilson DR, Martinez TR (1997) Instance pruning techniques. Int Conf Mach Learn 97:403–411
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62
Yang T, Mahdavi M, Jin R, Zhang L, Zhou Y (2012) Multiple kernel learning from noisy labels by stochastic programming. In: International conference on machine learning, pp 1–8
Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems, pp 2195–2203