Multiclass classification with potential function rules: Margin distribution and generalization

Pattern Recognition - Tập 45 - Trang 540-551 - 2012
Fei Teng1, Yixin Chen1, Xin Dang2
1Department of Computer and Information Science, The University of Mississippi, University, MS 38677, USA
2Department of Mathematics, The University of Mississippi, University, MS 38677, USA

Tài liệu tham khảo

Aizerman, 1964, Theoretical fundations of the potential function method in pattern recognition learning, Automation and Remote Control, 25, 917 Aizerman, 1964, The probability problem of pattern recognition learning and the method of potential functions, Automation and Remote Control, 25, 1307 Aizerman, 1964, The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points, Automation and Remote Control, 25, 1705 Aizerman, 1970, Extrapolative problems in automatic control and the method of potential functions, American Mathematical Society Translations, 87, 281, 10.1090/trans2/087/16 Anthony, 1992 Avi-Itzhak, 1996, Arbitrarily tight upper and lower bounds on the Bayesian probability of error, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 89, 10.1109/34.476017 Barnard, 2003, Matching words and pictures, Journal of Machine Learning Research, 3, 1107 A. Barron, Complexity regularization with application to artificial neural networks, in: Nonparametric Functional Estimation and Related Topics, Kluwer Academic Publisher, 1991, pp. 561–576. Bartlett, 1997, For valid generalization, the size of the weights is more important than the size of the network, Advances in Neural Information Processing Systems, 9, 134 Bashkirov, 1964, Potential function algorithms for pattern recognition learning machines, Automation and Remote Control, 25, 692 Bayes, 1763, An essay towards solving a problem in the doctrine of chances, The Philosophical Transactions, 53, 370 Ben-Bassat, 1980, Sensitivity analysis in Bayesian classification models: multiplicative deviations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 261, 10.1109/TPAMI.1980.4767015 Berger, 1985 Bishop, 2006 Boulle, 2007, Compression-based averaging of selective naive Bayes classifiers, Journal of Machine Learning Research, 8, 1659 Bousquet, 2002, Stability and generalization, Journal of Machine Learning Research, 2, 499 Braverman, 1965, On the method of potential functions, Automation and Remote Control, 26, 2205 Braverman, 1966, Estimation of the rate of convergence of algorithms based on the potential functions method, Automation and Remote Control, 27, 95 Bruneau, 2010, Parsimonious reduction of Gaussian mixture models with a variational-Bayes approach, Pattern Recognition, 43, 850, 10.1016/j.patcog.2009.08.006 Chen, 2003, Support vector learning for fuzzy rule-based classification systems, IEEE Transactions on Fuzzy Systems, 11, 716, 10.1109/TFUZZ.2003.819843 Chen, 2006, MILES: multiple-instance learning via embedded instance selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1931, 10.1109/TPAMI.2006.248 Chen, 2009, Similarity-based classification: concepts and algorithms, Journal of Machine Learning Research, 10, 747 Davis, 1975, Mean square error properties of density estimates, The Annals of Statistics, 3, 1025, 10.1214/aos/1176343207 Davis, 1977, Mean integrated square error properties of density estimates, The Annals of Statistics, 5, 530, 10.1214/aos/1176343850 Devijver, 1974, On a new class of bounds on Bayes risk in multi-hypothesis pattern recognition, IEEE Transactions on Computers, 23, 70, 10.1109/T-C.1974.223779 Devroye, 1981, On the asymptotic probability of error in nonparametric discrimination, The Annals of Statistics, 9, 1320, 10.1214/aos/1176345648 Devroye, 1983, The equivalence of weak, strong and complete convergence in L1 for kernel density estimates, The Annals of Statics, 11, 896 Devroye, 1988, Asymptotic performance bounds for the kernel estimate, The Annals of Statistics, 16, 1162, 10.1214/aos/1176350953 Devroye, 1989, A universal lower bound for the kernel estimate, Statistics and Probability Letters, 8, 419, 10.1016/0167-7152(89)90021-7 Devroye, 1996 Devroye, 1998, The Hilbert kernel regression estimate, Journal of Multivariate Analysis, 65, 209, 10.1006/jmva.1997.1725 Devroye, 1999, On the Hilbert kernel density estimate, Statistics and Probability Letters, 44, 299, 10.1016/S0167-7152(99)00021-8 Devroye, 2002, New multivariate product density estimator, Journal of Multivariate Analysis, 82, 88, 10.1006/jmva.2001.2021 Domingos, 1997, On the optimality of the simple Bayesian classifier under zero–one loss, Machine Learning, 29, 103, 10.1023/A:1007413511361 Duda, 2001 Garg, 2003, Margin distribution and learning algorithms, 210 Gordon, 1978, Asymptotically efficient solutions to the classification problem, The Annals of Statistics, 6, 515, 10.1214/aos/1176344197 Griffiths, 1998 Guermeur, 2007, VC theory of large margin multi-category classifiers, Journal of Machine Learning Research, 8, 2551 Halevy, 2009, The unreasonable effectiveness of data, IEEE Intelligent Systems, 24, 8, 10.1109/MIS.2009.36 Hashlamoun, 1994, A tight upper bound on the Bayesian probability of error, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 220, 10.1109/34.273728 Hastie, 2001 Hofmann, 1999, Unsupervised learning from dyadic data, Advances in Neural Information Processing Systems, 11, 466 Jin, 2010, Regularized margin-based conditional log-likelihood loss for prototype learning, Pattern Recognition, 43, 428, 10.1016/j.patcog.2010.01.013 Kearns, 1994 Kim, 2006, Bayesian Gaussian process classification with the EM–EP algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1948, 10.1109/TPAMI.2006.238 Kim, 2010, Large margin cost-sensitive learning of conditional random fields, Pattern Recognition, 43, 3683, 10.1016/j.patcog.2010.05.013 Langford, 2002, PAC-Bayes and margins, Advances in Neural Information Processing Systems, 15, 439 Langley, 1992, An analysis of Bayesian classifiers, 223 Langseth, 2009, Latent classification models for binary data, Pattern Recognition, 42, 2724, 10.1016/j.patcog.2009.05.002 Lugosi, 1996, Concept learning using complexity regularization, IEEE Transactions on Information Theory, 42, 48, 10.1109/18.481777 Maurer, 2008, Learning similarity with operator-valued large-margin classifiers, Journal of Machine Learning Research, 9, 1049 Mitchell, 1997 Pekalska, 2001, A generalized kernel approach to dissimilarity-based classification, Journal of Machine Learning Research, 2, 175 Rätsch, 2005, Efficient margin maximizing with boosting, Journal of Machine Learning Research, 6, 2131 Rosset, 2004, Boosting as a regularized path to a maximum margin classifier, Journal of Machine Learning Research, 5, 941 Schapire, 1998, Boosting the margin: a new explanation for the effectiveness of voting methods, The Annals of Statistics, 26, 1651 Schölkopf, 2002 Shawe-Taylor, 2004 Stone, 1977, Consistent nonparametric regression, The Annals of Statistics, 5, 595, 10.1214/aos/1176343886 Sung, 2008, Latent-space variational Bayes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 2236, 10.1109/TPAMI.2008.157 Tibshirani, 2007, Margin trees for high-dimensional classification, Journal of Machine Learning Research, 8, 637 Vapnik, 1971, On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probabilities and Its Applications, 16, 264, 10.1137/1116025 Vapnik, 1982 Vapnik, 1998 Veeramachaneni, 2007, Analytical results on style-constrained Bayesian classification of pattern fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1280, 10.1109/TPAMI.2007.1030 Wang, 2007, Large margin semi-supervised learning, Journal of Machine Learning Research, 8, 1867 Watson, 1963, On the estimation of the probability density I, The Annals of Mathematical Statistics, 34, 480, 10.1214/aoms/1177704159 Weinberger, 2009, Distance metric learning for large margin nearest neighbor classification, Journal of Machine Learning Research, 10, 207