Consumer credit risk: Individual probability estimates using machine learning

Expert Systems with Applications - Tập 40 Số 13 - Trang 5125-5131 - 2013
Jochen Kruppa1, Alexandra Schwarz2, Gerhard Arminger2, Andreas Ziegler1
1Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Str. 1, 23562 Lübeck, Germany
2Schumpeter School of Business and Economics, University of Wuppertal, Gaußstraße 20, 42097 Wuppertal, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Agresti, 2005, Simple improved confidence intervals for comparing matched proportions, Statistics in Medicine, 24, 729, 10.1002/sim.1781

Arminger, 1997, Analyzing credit risk data: A comparison of logistic discrimination, classification tree analysis, and feedforward networks, Computational Statistics, 12, 293

Baesens, 2003, Benchmarking state-of-the-art classification algorithms for credit scoring, Journal of the Operational Research Society, 54, 627, 10.1057/palgrave.jors.2601545

Banerjee, 2012, Identifying representative trees from ensembles, Statistics in Medicine, 31, 1601, 10.1002/sim.4492

Bauer, 1999, An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Machine Learning, 36, 105, 10.1023/A:1007515423169

Biau, 2012, Analysis of a random forests model, Journal of Machine Learning Research, 13, 1063

Biau, 2010, On the rate of convergence of the bagged nearest neighbor estimate, Journal of Machine Learning Research, 11, 687

Biau, 2010, Rates of convergence of the functional k-nearest neighbor estimate, IEEE Transactions on Information Theory, 56, 2034, 10.1109/TIT.2010.2040857

Biau, 2010, On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification, Journal of Multivariate Analysis, 101, 2499, 10.1016/j.jmva.2010.06.019

Biau, 2008, Consistency of random forests and other averaging classifiers, Journal of Machine Learning Research, 9, 2015

Bonne, 2000

Bradley, 2008, Sampling uncertainty and confidence intervals for the Brier score and Brier skill score, Weather Forecast, 23, 992, 10.1175/2007WAF2007049.1

Breiman, 1996, Bagging predictors, Machine Learning, 24, 123, 10.1007/BF00058655

Breiman, 2001, Random forests, Machine Learning, 45, 5, 10.1023/A:1010933404324

Breiman, 1984

Brigham, 1992

Brown, 2012, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Application, 39, 3446, 10.1016/j.eswa.2011.09.033

Buntine, W. L. (1992). A theory of learning classification rules. Ph.D. University of Technology, Sydney. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.5614.

Crook, 2007, Recent developments in consumer credit risk assessment, European Journal of Operational Research, 183, 1447, 10.1016/j.ejor.2006.09.100

Delong, 1988, Comparing the areas under 2 or more correlated receiver operating characteristic curves – A nonparametric approach, Biometrics, 44, 837, 10.2307/2531595

Devroye, 1994, On the strong universal consistency of nearest neighbor regression function estimates, Annals of Statistics, 22, 1371, 10.1214/aos/1176325633

Devroye, 1996

Díaz-Uriarte, 2006, Gene selection and classification of microarray data using random forest, BMC Bioinformatics, 7, 3, 10.1186/1471-2105-7-3

Gneiting, 2007, Strictly proper scoring rules, prediction, and estimation, Journal of American Statistics Association, 102, 359, 10.1198/016214506000001437

Hand, 1998

Hand, 1997, Statistical classification methods in consumer credit scoring: a review, Journal of the Royal Statistical Society Series A. Statistics in Society, 160, 523, 10.1111/j.1467-985X.1997.00078.x

Ikeda, 2001, Application of resampling techniques to the statistical analysis of the Brier score, Methods of Information in Medicine, 40, 259, 10.1055/s-0038-1634163

Johnson, 1986

König, I. R., Malley, J. D., Pajevic, S., Weimar, C., Diener, H.-C., & Ziegler, A., on behalf of the German Stroke Study Collaborators (2008). Patient-centered yes/no prognosis using learning machines. International Journal of Data Mining and Bioinformatics, 2, 289–341. http://dx.doi.org/10.1504/IJDMB.2008.022149.

Kruppa, 2012, Risk estimation and risk prediction using machine-learning methods, Human Genetics, 131, 1639, 10.1007/s00439-012-1194-y

Liu, 2011, Soft or hard classification? Large margin unified machines, Journal of American Statistics Association, 106, 166, 10.1198/jasa.2011.tm10319

Malley, 2012, Probability machines: consistent probability estimation using nonparametric learning machines, Methods of Information in Medicine, 51, 74, 10.3414/ME00-01-0052

Newcombe, 1998, Improved confidence intervals for the difference between binomial proportions based on paired data, Statistics in Medicine, 17, 2635, 10.1002/(SICI)1097-0258(19981130)17:22<2635::AID-SIM954>3.0.CO;2-C

Newcombe, 1998, Two-sided confidence intervals for the single proportion: comparison of seven methods, Statistics in Medicine, 17, 857, 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E

Nicodemus, 2010, The behaviour of random forest permutation-based variable importance measures under predictor correlation, BMC Bioinformatics, 11, 110, 10.1186/1471-2105-11-110

Pepe, 2008, Gauging the performance of SNPs, biomarkers, and clinical factors for predicting risk of breast cancer, Journal of the National Cancer Institute, 100, 978, 10.1093/jnci/djn215

Provost, 2003, Tree induction for probability-based ranking, Machine Learning, 52, 199, 10.1023/A:1024099825458

Provost, 1998, The case against accuracy estimation for comparing induction algorithms, 445

Schwarz, 2008

Schwarz, 2010, On safari to random jungle: a fast implementation of random forests for high-dimensional data, Bioinformatics, 26, 1752, 10.1093/bioinformatics/btq257

Stanski, H. R., Wilson, L. J., & Burrows, W. R. (1989). Survey of common verification methods in meteorology. In World meteorological organization.

Tango, 2000, Confidence intervals for differences in correlated binary proportions, Statistics in Medicine, 19, 133, 10.1002/(SICI)1097-0258(20000115)19:1<133::AID-SIM373>3.0.CO;2-M

Thomas, 2002

Verstraeten, 2005, The impact of sample bias on consumer credit scoring performance and profitability, Journal of the Operational Research Society, 56, 981, 10.1057/palgrave.jors.2601920

Wu, 2007, Robust truncated-hinge-loss support vector machines, Journals of American Statistical Association, 102, 974, 10.1198/016214507000000617