Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal

Expert Systems with Applications - Tập 42 - Trang 5737-5753 - 2015
Raquel Florez-Lopez1, Juan Manuel Ramon-Jeronimo1
1University Pablo Olavide of Seville, Department of Financial Economics and Accounting, Utrera Road, km. 1, 41013 Seville, Spain

Tài liệu tham khảo

Abellan, 2014, Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring, Expert Systems with Applications, 41, 3825, 10.1016/j.eswa.2013.12.003 Abellan, 2012, Bagging schemes on the presence of class noise in classification, Expert Systems with Applications, 39, 6827, 10.1016/j.eswa.2012.01.013 Altman, 1968, Financial ratios, discriminant analysis, and the prediction of corporate bankruptcy, The Journal of Finance, 23, 589, 10.1111/j.1540-6261.1968.tb00843.x Baesens, 2003, Benchmarking state-of-the-art classification algorithms for credit scoring, Journal of the Operational Research Society, 54, 627, 10.1057/palgrave.jors.2601545 Basel Committee on Banking Supervision (BCBS). (2011, June). Basel III: A global regulatory framework for more resilient banks and banking systems. Basel: Bank for International Settlements. Breiman, 1984 Breiman, 1996, Bagging predictors, Machine Learning, 24, 123, 10.1007/BF00058655 Breiman, 2001, Random forests, Machine Learning, 45, 5, 10.1023/A:1010933404324 Brown, 2012, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39, 3446, 10.1016/j.eswa.2011.09.033 Cestnik, 1987, ASSISTANT 86: A knowledge elicitation tool for sophisticated users Chen, 2013, Hybrid models based on rough set classifiers for setting credit risk rating decision rules in the global banking industry, Knowledge-Based Systems, 39, 224, 10.1016/j.knosys.2012.11.004 Crook, 2007, Recent developments in consumer credit risk assessment, European Journal of Operational Research, 183, 1447, 10.1016/j.ejor.2006.09.100 Danenas, 2015, Selection of support vector machines based classifiers for credit risk domain, Expert Systems with Applications, 42, 3194, 10.1016/j.eswa.2014.12.001 Daubie, 2002, A comparison of the rough sets and recursive partitioning induction approaches: An application to commercial loans, International Transactions in Operational Research, 9, 681, 10.1111/1475-3995.00381 De Bock, 2012, Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models, Expert Systems with Applications, 39, 6816, 10.1016/j.eswa.2012.01.014 DeLong, 1988, Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach, Biometrics, 44, 837, 10.2307/2531595 Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1 Derelioglu, 2011, Knowledge discovery using neural approach for SME’s credit risk analysis problem in Turkey, Expert Systems with Applications, 38, 9313, 10.1016/j.eswa.2011.01.012 Dietterich, 2000, Ensemble methods in machine learning, 1 Efron, B., & Tibshirani, R.J. (1995). Cross-validation and the bootstrap: Estimating the error rate of a prediction rule. Technical Report 176, Stanford University. Efron, 1993 Fedorova, 2013, Bankruptcy prediction for Russian companies: Application of combined classifiers, Expert Systems with Applications, 40, 7285, 10.1016/j.eswa.2013.07.032 Feldman, 2005, Mortgage default: Classification trees analysis, The Journal of Real State Finance and Economics, 30, 369, 10.1007/s11146-005-7013-7 Finlay, 2011, Multiple classifier architectures and their applications to credit risk assessment, European Journal of Operational Research, 210, 368, 10.1016/j.ejor.2010.09.029 Friedman, 1940, A comparison of alternative tests of significance for the problem of m rankings, Annals of Mathematical Statistics, 11, 86, 10.1214/aoms/1177731944 Friedman, J.H. (1999a, February). Greedy function approximation: A gradient boosting machine. Technical Document, Stanford University. Friedman, J.H. (1999b, March). Stochastic gradient boosting. Technical Document, Stanford University. Gacto, 2011, Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures, Information Sciences, 181, 4340, 10.1016/j.ins.2011.02.021 Geng, 2015, Prediction of financial distress: An empirical study of listed Chinese companies using data mining, European Journal of Operational Research, 241, 236, 10.1016/j.ejor.2014.08.016 Hand, 2006, Classifier technology and the illusion of progress, Statistical Science, 21, 1, 10.1214/088342306000000060 Hand, 2009, Measuring classifier performance. A coherent alternative to the area under the ROC curve, Machine Learning, 77, 103, 10.1007/s10994-009-5119-5 Hanczar, 2010, Small-sample precision of ROC-related estimates, Bioinformatics, 26, 822, 10.1093/bioinformatics/btq037 Härdle, 2005, Predicting bankruptcy with support vector machines, 225 Harris, 2015, Credit scoring using the clustered support vector machine, Expert Systems with Applications, 42, 741, 10.1016/j.eswa.2014.08.029 Henley, 1997, Statistical classification methods in consumer credit scoring: A review, Journal of the Royal Statistical Society, Series A, 160, 523, 10.1111/j.1467-985X.1997.00078.x Ho, T.K. (1998). C4.5 decision forest. In Proceedings of the 14th international conference on pattern recognition (pp. 545–549), Brisbane, Australia. Hsieh, 2010, A data driven ensemble classifier for credit scoring analysis, Expert Systems with Applications, 37, 534, 10.1016/j.eswa.2009.05.059 Huang, 2007, Credit scoring with a data mining approach based on support vector machines, Expert Systems with Applications, 33, 847, 10.1016/j.eswa.2006.07.007 Hung, 2009, A selective ensemble based on expected probabilities for bankruptcy prediction, Expert Systems with Applications, 36, 5297, 10.1016/j.eswa.2008.06.068 Kass, 1980, An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 29, 119, 10.2307/2986296 Kestens, 2012, Trade credit and company performance during the 2008 financial crisis, Accounting and Finance, 52, 1125, 10.1111/j.1467-629X.2011.00452.x Kim, 1993, Expert systems for bond rating: A comparative analysis of statistical, rule-based and neural network systems, Expert Systems, 10, 167, 10.1111/j.1468-0394.1993.tb00093.x Kim, 2015, Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction, Expert Systems with Applications, 42, 1074, 10.1016/j.eswa.2014.08.025 Kuncheva, 2003, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning, 51, 181, 10.1023/A:1022859003006 Kruppa, 2013, Consumer credit risk: Individual probability estimates using machine learning, Expert Systems with Applications, 40, 5125, 10.1016/j.eswa.2013.03.019 Lee, 2005, A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines, Expert Systems with Applications, 28, 743, 10.1016/j.eswa.2004.12.031 Li, 2012, Relevance vector machine based infinite decision agent ensemble learning for credit risk analysis, Expert Systems with Applications, 39, 4947, 10.1016/j.eswa.2011.10.022 Marques, 2012, Exploring the behaviour of base classifiers in credit scoring ensembles, Expert Systems with Applications, 39, 10244, 10.1016/j.eswa.2012.02.092 Marques, 2012, Two-level classifier ensembles for credit risk assessment, Expert Systems with Applications, 39, 10916, 10.1016/j.eswa.2012.03.033 Martens, 2007, Comprehensible credit scoring models using rule extraction from support vector machines, European Journal of Operational Research, 183, 1466, 10.1016/j.ejor.2006.04.051 Menze, 2011, On oblique random forests, 453 Mues, 2004, Decision diagrams in machine learning: an empirical study on real-life credit-risk data, Expert Systems with Applications, 27, 257, 10.1016/j.eswa.2004.02.001 Nanni, 2009, An experimental comparison of ensemble classifiers for bankruptcy prediction and credit scoring, Expert Systems with Applications, 36, 3028, 10.1016/j.eswa.2008.01.018 Nemenyi, P.B. (1963). Distribution-free multiple comparisons [Ph.D. thesis]. Princeton University. Ong, 2005, Building credit scoring models using genetic programming, Expert Systems with Applications, 29, 41, 10.1016/j.eswa.2005.01.003 Paleologo, 2010, Subagging for credit scoring models, European Journal of Operational Research, 201, 490, 10.1016/j.ejor.2009.03.008 Quinlan, 1979, Discovering rules by induction from large collections of examples Quinlan, 1993 Rodriguez, 2006, Rotation forests: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1619, 10.1109/TPAMI.2006.211 Schapire, 1990, The strength of weak learnability, Machine Learning, 5, 197, 10.1007/BF00116037 Setiono, 2011, Rule extraction from minimal neural networks for credit card screening, International Journal of Neural Systems, 21, 265, 10.1142/S0129065711002821 Seyedhosseini, 2015, Disjunctive normal random forests, Pattern Recognition, 48, 976, 10.1016/j.patcog.2014.08.023 Sun, 2014, Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches, Knowledge-Based Systems, 57, 41, 10.1016/j.knosys.2013.12.006 Sun, 2012, Financial distress prediction using support vector machines: Ensemble vs. individual, Applied Soft Computing, 12, 2254, 10.1016/j.asoc.2012.03.028 Tsai, 2014, A comparative study of classifier ensembles for bankruptcy prediction, Applied Soft Computing, 24, 977, 10.1016/j.asoc.2014.08.047 Tomczak, 2015, Classification restricted Boltzmann machine for comprehensible credit scoring model, Expert Systems with Applications, 42, 1789, 10.1016/j.eswa.2014.10.016 Twala, 2010, Multiple classifier application to credit risk assessment, Expert Systems with Applications, 37, 3326, 10.1016/j.eswa.2009.10.018 Wang, 2011, Study of corporate credit risk prediction based on integrating boosting and random subspace, Expert Systems with Applications, 38, 13871 Wang, 2012, A hybrid ensemble approach for enterprise credit risk assessment based on support vector machine, Expert Systems with Applications, 39, 5325, 10.1016/j.eswa.2011.11.003 Wang, 2011, A comparative assessment of ensemble learning for credit scoring, Expert Systems with Applications, 38, 223, 10.1016/j.eswa.2010.06.048 Wang, 2012, Two credit scoring models based on dual strategy ensemble trees, Knowledge-Based Systems, 26, 61, 10.1016/j.knosys.2011.06.020 Wang, 2014, An improved boosting based on feature selection for corporate bankruptcy prediction, Expert Systems with Applications, 41, 2353, 10.1016/j.eswa.2013.09.033 West, 2000, Neural networks credit scoring models, Computers & Operations Research, 22, 1131, 10.1016/S0305-0548(99)00149-5 Wolpert, 1992, Stacked generalization, Neural Networks, 5, 241, 10.1016/S0893-6080(05)80023-1 Wu, 2012, Credit risk assessment and decision making by a fusion approach, Knowledge-Based Systems, 35, 102, 10.1016/j.knosys.2012.04.025 Yu, 2010, Support vector machine based multiagent ensemble learning for credit risk evaluation, Expert Systems with Applications, 37, 1351, 10.1016/j.eswa.2009.06.083 Yu, 2008, Credit risk assessment with a multistage neural network ensemble learning approach, Expert Systems with Applications, 34, 1434, 10.1016/j.eswa.2007.01.009 Zhou, 2010, Least squares support vector machines ensemble models for credit scoring, Expert Systems with Applications, 37, 127, 10.1016/j.eswa.2009.05.024