Consistency of the estimator of binary response models based on AUC maximization
Tóm tắt
This paper examines the asymptotic properties of a binary response model estimator based on maximization of the Area Under receiver operating characteristic Curve (AUC). Given certain assumptions, AUC maximization is a consistent method of binary response model estimation up to normalizations. As AUC is equivalent to Mann-Whitney U statistics and Wilcoxon test of ranks, maximization of area under ROC curve is equivalent to the maximization of corresponding statistics. Compared to parametric methods, such as logit and probit, AUC maximization relaxes assumptions about error distribution, but imposes some restrictions on the distribution of explanatory variables, which can be easily checked, since this information is observable.
Tài liệu tham khảo
Agarwal S, Har-Peled S, Roth D (2005) A uniform convergence bound for the area under the ROC curve. In: Proceedings of the 10th international workshop on artificial intelligence and, statistics, pp 1–8
Ailon N, Mohri M (2007) An efficient reduction of ranking to classification. Technical Report TR-2007-903, New York University
Balcan MF, Bansal N, Beygelzimer A, Coppersmith D, Langford J, Sorkin GB (2008) Robust reductions from ranking to classification. Mach Learn J 72(1–2):139–153
Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12(4):387–415
Cortes C, Mohri M (2004) AUC optimization vs error rate minimization. Advances in neural information processing systems. MIT Press, Cambridge
Jaroszewicz S (2006) Polynomial association rules with applications to logistic regression. KDD conference paper, pp 586–591
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Herschtal A, Raskutti B (2004) Optimising area under the roc curve using gradient descent. ACM Press, ICML
Horowitz JL (1992) Smoothed maximum score estimator for the binary response model. Econometrica 60(3):505–531
Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3(3): 205–228
Manski CF (1983) Closest empirical distribution estimation. Econometrica 51(2):305–319
Manski CF (1985a) Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator. J Econom 27(3):313–333
Manski CF (1985b) Semiparametric analysis of binary response from response-based samples. J Econom 31(1):31–40
Manski CF (1986) Operational characteristics of maximum score estimation. J Econom 32(1):85–108
Manski CF (1988) Identification of binary response models. J Am Stat Assoc 83(403):729–738
Marrocco C, Duin RPW, Tortorella F (2008) Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recognit 41(6):1961–1974
Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Anal Artif Intell proceedings, 71–80
Toh KA, Kim J, Lee S (2008) Maximizing area under ROC curve for biometric scores fusion. Pattern Recognit 41:3373–3392
Train K (2003) Discrete choice methods with simulation, 1st edn. Cambridge University Press, Cambridge
Wenxia G, Whitmore GA (2010) Binary response and logistic regression in recent accounting research publications: a methodological note. Rev Quant Financ Account 34(1):81–93
Wooldridge JM (2006) Introductory econometrics: a modern approach, 3rd edn. Thomson South-Western, Canada