Consistency of the estimator of binary response models based on AUC maximization

Journal of the Italian Statistical Society - Tập 22 - Trang 381-390 - 2013
Igor Fedotenkov1
1Department of Economics, University of Verona, Verona, Italy

Tóm tắt

This paper examines the asymptotic properties of a binary response model estimator based on maximization of the Area Under receiver operating characteristic Curve (AUC). Given certain assumptions, AUC maximization is a consistent method of binary response model estimation up to normalizations. As AUC is equivalent to Mann-Whitney U statistics and Wilcoxon test of ranks, maximization of area under ROC curve is equivalent to the maximization of corresponding statistics. Compared to parametric methods, such as logit and probit, AUC maximization relaxes assumptions about error distribution, but imposes some restrictions on the distribution of explanatory variables, which can be easily checked, since this information is observable.

Tài liệu tham khảo

Agarwal S, Har-Peled S, Roth D (2005) A uniform convergence bound for the area under the ROC curve. In: Proceedings of the 10th international workshop on artificial intelligence and, statistics, pp 1–8 Ailon N, Mohri M (2007) An efficient reduction of ranking to classification. Technical Report TR-2007-903, New York University Balcan MF, Bansal N, Beygelzimer A, Coppersmith D, Langford J, Sorkin GB (2008) Robust reductions from ranking to classification. Mach Learn J 72(1–2):139–153 Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12(4):387–415 Cortes C, Mohri M (2004) AUC optimization vs error rate minimization. Advances in neural information processing systems. MIT Press, Cambridge Jaroszewicz S (2006) Polynomial association rules with applications to logistic regression. KDD conference paper, pp 586–591 Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36 Herschtal A, Raskutti B (2004) Optimising area under the roc curve using gradient descent. ACM Press, ICML Horowitz JL (1992) Smoothed maximum score estimator for the binary response model. Econometrica 60(3):505–531 Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3(3): 205–228 Manski CF (1983) Closest empirical distribution estimation. Econometrica 51(2):305–319 Manski CF (1985a) Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator. J Econom 27(3):313–333 Manski CF (1985b) Semiparametric analysis of binary response from response-based samples. J Econom 31(1):31–40 Manski CF (1986) Operational characteristics of maximum score estimation. J Econom 32(1):85–108 Manski CF (1988) Identification of binary response models. J Am Stat Assoc 83(403):729–738 Marrocco C, Duin RPW, Tortorella F (2008) Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recognit 41(6):1961–1974 Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Anal Artif Intell proceedings, 71–80 Toh KA, Kim J, Lee S (2008) Maximizing area under ROC curve for biometric scores fusion. Pattern Recognit 41:3373–3392 Train K (2003) Discrete choice methods with simulation, 1st edn. Cambridge University Press, Cambridge Wenxia G, Whitmore GA (2010) Binary response and logistic regression in recent accounting research publications: a methodological note. Rev Quant Financ Account 34(1):81–93 Wooldridge JM (2006) Introductory econometrics: a modern approach, 3rd edn. Thomson South-Western, Canada