Statistical and machine learning models in credit scoring: A systematic literature survey

Applied Soft Computing - Tập 91 - Trang 106263 - 2020
Xolani Dastile1, Turgay Çelik1,2, Moshe Moses Potsane1
1School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa
2Wits Institute of Data Science, University of the Witwatersrand, Johannesburg, South Africa

Tóm tắt

Từ khóa


Tài liệu tham khảo

Thomas, 2002

Thomas, 2000, A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers, Int. J. Forecast., 16, 149, 10.1016/S0169-2070(00)00034-0

Siddiqi, 2005

Myung, 2003, Tutorial on maximum likelihood estimation, J. Math. Psych., 47, 90, 10.1016/S0022-2496(02)00028-7

Baesens, 2003, Using neural network rule extraction and decision tables for credit-risk evaluation, Manage. Sci., 49, 312, 10.1287/mnsc.49.3.312.12739

Alaka, 2018, Systematic review of bankruptcy prediction models: Towards a framework for tool selection, Expert Syst. Appl., 94, 164, 10.1016/j.eswa.2017.10.040

Schlosser, 2007, 1

Bellovary, 2007, A review of bankruptcy prediction studies: 1930 to present, J. Financial Educ., 33, 1

Abdou, 2011, Credit scoring, statistical techniques and evaluation criteria: A review of the literature, Int. J. Intell. Syst. Account. Financ. Manage., 18, 59, 10.1002/isaf.325

Lin, 2012, Machine learning in financial crisis prediction: A survey, IEEE Trans. Syst. Man Cybern. C, 42, 421, 10.1109/TSMCC.2011.2170420

Wang, 2015, A survey of applying machine learning techniques for credit rating: existing models and open issues, 122

Louzada, 2016, Classification methods applied to credit scoring: Systematic review and overall comparison, Surv. Oper. Res. Manag. Sci., 21, 117

Devi, 2018

Liang, 2015, The effect of feature selection on financial distress prediction, Knowl.-Based Syst., 73, 289, 10.1016/j.knosys.2014.10.010

Brown, 2012, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Syst. Appl., 39, 3446, 10.1016/j.eswa.2011.09.033

Bijak, 2012, Does segmentation always improve model performance in credit scoring?, Expert Syst. Appl., 39, 2433, 10.1016/j.eswa.2011.08.093

Chen, 2010, Combination of feature selection approaches with SVM in credit scoring, Expert Syst. Appl., 37, 4902, 10.1016/j.eswa.2009.12.025

W. Chen, L. Shi, Credit scoring with F-score based on support vector machine, in: Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer, MEC, 2013, pp. 1512–1516.

Chen, 2017, The study of credit scoring model based on group lasso, Procedia Comput. Sci., 122, 677, 10.1016/j.procs.2017.11.423

Chi, 2012, A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model, Expert Syst. Appl., 39, 2650, 10.1016/j.eswa.2011.08.120

Back, 1996, Neural networks and genetic algorithms for bankruptcy predictions, Expert Syst. Appl., 11, 407, 10.1016/S0957-4174(96)00055-3

Oreski, 2014, Genetic algorithm-based heuristic for feature selection in credit risk assessment, Expert Syst. Appl., 41, 2052, 10.1016/j.eswa.2013.09.004

Song, 2017, Feature selection based on FDA and F-score for multi-class classification, Expert Syst. Appl., 81, 22, 10.1016/j.eswa.2017.02.049

Pawlak, 1997, Rough set approach to knowledge-based decision support, European J. Oper. Res., 99, 48, 10.1016/S0377-2217(96)00382-7

Wang, 2010, Rough set and tabu search based feature selection for credit scoring, Procedia Comput. Sci., 1, 2425, 10.1016/j.procs.2010.04.273

Zhang, 2016, A survey on rough set theory and its applications, CAAI Trans. Intell. Technol., 1, 323, 10.1016/j.trit.2016.11.001

Tsai, 2009, Feature selection in bankruptcy prediction, Knowl.-Based Syst., 22, 120, 10.1016/j.knosys.2008.08.002

Mitchell, 1996

Kozeny, 2015, Genetic algorithms for credit scoring: Alternative fitness function performance comparison, Expert Syst. Appl., 42, 2998, 10.1016/j.eswa.2014.11.028

Crepinsek, 2013, Exploration and exploitation in evolutionary algorithms: A survey, ACM Comput. Surv., 45, 35:1, 10.1145/2480741.2480752

Liu, 2009, To explore or to exploit: An entropy-driven approach for evolutionary algorithms, KES J., 13, 185, 10.3233/KES-2009-0184

Cadenas, 2013, Feature subset selection filter–wrapper based on low quality data, Expert Syst. Appl., 40, 6241, 10.1016/j.eswa.2013.05.051

Tibshirani, 2011, Regression shrinkage and selection via the lasso: a retrospective, J. R. Stat. Soc. Ser. B Stat. Methodol., 73, 273, 10.1111/j.1467-9868.2011.00771.x

Zheng, 2018

S. Sehgal, H. Singh, M. Agarwal, V. Bhasker, . Shantanu, Data analysis using principal component analysis, in: 2014 International Conference on Medical Imaging, M-Health and Emerging Communication Systems, MedCom, 2014, pp. 45–48.

Fisher, 1936, The use of multiple measurements in taxonomic problems, Ann. Eugen., 7, 179, 10.1111/j.1469-1809.1936.tb02137.x

Martinez, 2001, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell., 23, 228, 10.1109/34.908974

Rao, 1948, The utilization of multiple measurements in problems of biological classification, J. R. Stat. Soc., 159, 10.1111/j.2517-6161.1948.tb00008.x

Duda, 2001

Reynolds, 2015, Gaussian mixture models, 827

Dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B Stat. Methodol., 39, 1, 10.1111/j.2517-6161.1977.tb01600.x

Henley, 1996, A k-nearest-neighbour classifier for assessing consumer credit risk, J. R. Stat. Soc., 45, 77

Cortes, 1995, Support-vector networks, Mach. Learn., 20, 273, 10.1007/BF00994018

Schölkopf, 2000, The kernel trick for distances, 283

Mitchell, 1997

Barboza, 2017, Machine learning models and bankruptcy prediction, Expert Syst. Appl., 83, 405, 10.1016/j.eswa.2017.04.006

Tsai, 2014, A comparative study of classifier ensembles for bankruptcy prediction, Appl. Soft Comput., 24, 977, 10.1016/j.asoc.2014.08.047

Tsai, 2008, Using neural network ensembles for bankruptcy prediction and credit scoring, Expert Syst. Appl., 34, 2639, 10.1016/j.eswa.2007.05.019

M.D. Odom, R. Sharda, A neural network model for bankruptcy prediction, in: 1990 IJCNN International Joint Conference on Neural Networks, vol. 2, 1990, pp. 163–168.

Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324

Freund, 1997, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. System Sci., 55, 119, 10.1006/jcss.1997.1504

Chen, 2016, XGBoost: A scalable tree boosting system, 785

Nobre, 2019, Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets, Expert Syst. Appl., 125, 10.1016/j.eswa.2019.01.083

Xia, 2017, A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring, Expert Syst. Appl., 78, 10.1016/j.eswa.2017.02.017

Breiman, 1996, Bagging predictors, Mach. Learn., 24, 123, 10.1007/BF00058655

Yu, 2018, A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data, Appl. Soft Comput., 69, 192, 10.1016/j.asoc.2018.04.049

Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Comput., 18, 1527, 10.1162/neco.2006.18.7.1527

LeCun, 1989, Backpropagation applied to handwritten zip code recognition, Neural Comput., 1, 541, 10.1162/neco.1989.1.4.541

Seo, 2019, Hierarchical convolutional neural networks for fashion image classification, Expert Syst. Appl., 116, 328, 10.1016/j.eswa.2018.09.022

Lecun, 1995

Ting, 2019, Convolutional neural network improvement for breast cancer classification, Expert Syst. Appl., 120, 103, 10.1016/j.eswa.2018.11.008

Sezer, 2018, Algorithmic financial trading with deep convolutional neural networks: time series to image conversion approach, Appl. Soft Comput., 70, 525, 10.1016/j.asoc.2018.04.024

Zhao, 2017, Convolutional neural networks for time series classification, J. Syst. Eng. Electron., 28, 162, 10.21629/JSEE.2017.01.18

Chollet, 2017

Goodfellow, 2016

Bishop, 1995

Tomczak, 2015, Classification restricted Boltzmann machine for comprehensible credit scoring model, Expert Syst. Appl., 42, 1789, 10.1016/j.eswa.2014.10.016

Douzas, 2017, Self-organizing map oversampling (SOMO) for imbalanced data set learning, Expert Syst. Appl., 82, 40, 10.1016/j.eswa.2017.03.073

Chawla, 2002, SMOTE: Synthetic minority over-sampling technique, J. Artificial Intelligence Res., 16, 321, 10.1613/jair.953

Saia, 2018

Nobre, 2019, Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets, Expert Syst. Appl., 125, 10.1016/j.eswa.2019.01.083

Saia, 2016

Setiono, 1996, Symbolic representation of neural networks, Computer, 29, 71, 10.1109/2.485895

Craven, 1995, Extracting tree-structured representations of trained networks, 24

Ribeiro, 2016, “Why should I trust you?”: Explaining the predictions of any classifier, CoRR, abs/1602.04938

Eisenbeis, 1978, Problems in applying discriminant analysis in credit scoring models, J. Bank. Financ., 2, 205, 10.1016/0378-4266(78)90012-2

John, 1995, Estimating continuous distributions in Bayesian classifiers, 338

Yu, 2011, Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection, Expert Syst. Appl., 38, 15392, 10.1016/j.eswa.2011.06.023

Y. Jiang, Credit scoring model based on the decision tree and the simulated annealing algorithm, in: 2009 WRI World Congress on Computer Science and Information Engineering, Vol. 4, 2009, pp. 18–22.

Setiono, 1997, Neurolinear: From neural networks to oblique decision rules, Neurocomputing, 17, 1, 10.1016/S0925-2312(97)00038-6

R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad, Intelligible models for healthCare: Predicting pneumonia risk and hospital 30-day readmission, in: KDD ’15, 2015.

Lundberg, 2017, A unified approach to interpreting model predictions, CoRR, abs/1705.07874

Friedman, 2000, Greedy function approximation: A gradient boosting machine, Ann. Statist., 29, 1189, 10.1214/aos/1013203451

Luo, 2017, A deep learning approach for credit scoring using credit default swaps, Eng. Appl. Artif. Intell., 65, 465, 10.1016/j.engappai.2016.12.002

S. Ramasamy, K. Rajaraman, A hybrid meta-cognitive restricted Boltzmann machine classifier for credit scoring, in: TENCON 2017 - 2017 IEEE Region 10 Conference, 2017, pp. 2313–2318.

K. Tran, T. Duong, Q. Ho, Credit scoring model: A combination of genetic programming and deep learning, in: 2016 Future Technologies Conference, FTC, 2016, pp. 145–149.

S.H. Yeh, C.J. Wang, M.F. Tsai, Deep belief networks for predicting corporate defaults, in: 2015 24th Wireless and Optical Communication Conference, WOCC, 2015, pp. 159–163.

V. Neagoe, A. Ciotec, G. Cucu, Deep convolutional neural networks versus multilayer perceptron for financial prediction, in: 2018 International Conference on Communications, COMM, 2018, pp. 201–206.

Hamori, 2018, Ensemble learning or deep learning? Application to default risk analysis, J. Risk Financial Manag., 11, 10.3390/jrfm11010012

Shorten, 2019, A survey on image data augmentation for deep learning, J. Big Data, 6, 60, 10.1186/s40537-019-0197-0

Gómez-Ríos, 2018, Towards highly accurate coral texture images classification using deep convolutional neural networks and data augmentation, CoRR, abs/1804.00516

Kvamme, 2018, Predicting mortgage default using convolutional neural networks, Expert Syst. Appl., 102, 10.1016/j.eswa.2018.02.029

Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, Neural Inf. Process. Syst., 25

Perez, 2017, The effectiveness of data augmentation in image classification using deep learning, CoRR

Salamon, 2016, Deep convolutional neural networks and data augmentation for environmental sound classification, CoRR

Frid-Adar, 2018, Synthetic data augmentation using GAN for improved liver lesion classification, CoRR

B. Zhu, W. Yang, H. Wang, Y. Yuan, A hybrid deep learning model for consumer credit scoring, in: 2018 International Conference on Artificial Intelligence and Big Data, ICAIBD, 2018, pp. 205–208.

M.F. Kiani, F. Mahmoudi, A new hybrid method for credit scoring based on clustering and support vector machine (ClsSVM), in: 2010 2nd IEEE International Conference on Information and Financial Engineering, 2010, pp. 585–589.

Zhang, 2010, Vertical bagging decision trees model for credit scoring, Expert Syst. Appl., 37, 7838, 10.1016/j.eswa.2010.04.054

Farquad, 2011, Credit scoring using PCA-SVM hybrid model, 249

Ping, 2011, Neighborhood rough set and SVM based hybrid credit scoring classifier, Expert Syst. Appl., 38, 11300, 10.1016/j.eswa.2011.02.179

Wang, 2012, Rough set and scatter search metaheuristic based feature selection for credit scoring, Expert Syst. Appl., 39, 6123, 10.1016/j.eswa.2011.11.011

Han, 2013, Orthogonal support vector machine for credit scoring, Eng. Appl. Artif. Intell., 26, 848, 10.1016/j.engappai.2012.10.005

Shi, 2013, Credit scoring by feature-weighted support vector machines, J. Zhejiang Univ. Sci. C, 14, 197, 10.1631/jzus.C1200205

Q. Li, J. Zhang, Y. Wang, K. Kang, Credit risk classification using discriminative restricted boltzmann machines, in: 2014 IEEE 17th International Conference on Computational Science and Engineering, 2014, pp. 1697–1700.

Maldonado, 2017, Cost-based feature selection for support vector machines: An application in credit scoring, European J. Oper. Res., 261, 656, 10.1016/j.ejor.2017.02.037

H. Sutrisno, S. Halim, Credit scoring refinement using optimized logistic regression, in: 2017 International Conference on Soft Computing, Intelligent System and Information Technology, ICSIIT, 2017, pp. 26–31.

Mancisidor, 2018

X. Zhang, Y. Yang, Z. Zhou, A novel credit scoring model based on optimized random forest, in: 2018 IEEE 8th Annual Computing and Communication Workshop and Conference, CCWC, 2018, pp. 60–65.

Jadhav, 2018, Information gain directed genetic algorithm wrapper feature selection for credit rating, Appl. Soft Comput., 69, 541, 10.1016/j.asoc.2018.04.033

Dong, 2010, Credit scorecard based on logistic regression with random coefficients, Procedia Comput. Sci., 1, 2463, 10.1016/j.procs.2010.04.278

Twala, 2010, Multiple classifier application to credit risk assessment, Expert Syst. Appl., 37, 3326, 10.1016/j.eswa.2009.10.018

Hsieh, 2010, A data driven ensemble classifier for credit scoring analysis, Expert Syst. Appl., 37, 534, 10.1016/j.eswa.2009.05.059

Yu, 2011, Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection, Expert Syst. Appl., 38, 15392, 10.1016/j.eswa.2011.06.023

Wang, 2011, A comparative assessment of ensemble learning for credit scoring, Expert Syst. Appl., 38, 223, 10.1016/j.eswa.2010.06.048

Q. Wang, K.K. Lai, D. Niu, Green credit scoring system and its risk assessemt model with support vector machine, in: 2011 Fourth International Joint Conference on Computational Sciences and Optimization, 2011, pp. 284–287.

Ribeiro, 2011, Deep belief networks for financial prediction, 766

Yap, 2011, Using data mining to improve assessment of credit worthiness via credit scoring models, Expert Syst. Appl., 38, 13274, 10.1016/j.eswa.2011.04.147

Louzada, 2011, Poly-bagging predictors for classification modelling for credit scoring, Expert Syst. Appl., 38, 12717, 10.1016/j.eswa.2011.04.059

Marqués, 2012, Exploring the behaviour of base classifiers in credit scoring ensembles, Expert Syst. Appl., 39, 10244, 10.1016/j.eswa.2012.02.092

Marqués, 2012, Two-level classifier ensembles for credit risk assessment, Expert Syst. Appl., 39, 10916, 10.1016/j.eswa.2012.03.033

B. Tang, S. Qiu, A new credit scoring method based on improved fuzzy support vector machine, in: 2012 IEEE International Conference on Computer Science and Automation Engineering, CSAE, Vol. 3, 2012, pp. 73–75.

Louzada, 2012, On the impact of disproportional samples in credit scoring models: An application to a Brazilian bank data, Expert Syst. Appl., 39, 8071, 10.1016/j.eswa.2012.01.134

Abellán, 2014, Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring, Expert Syst. Appl., 41, 3825, 10.1016/j.eswa.2013.12.003

Harris, 2015, Credit scoring using the clustered support vector machine, Expert Syst. Appl., 42, 741, 10.1016/j.eswa.2014.08.029

B. Yi, J. Zhu, Credit scoring with an improved fuzzy support vector machine based on grey incidence analysis, in: 2015 IEEE International Conference on Grey Systems and Intelligent Services, GSIS, 2015, pp. 173–178.

Jones, 2015, An empirical evaluation of the performance of binary classifiers in the prediction of credit ratings changes, J. Bank. I Finance, 56, 72, 10.1016/j.jbankfin.2015.02.006

J. Chen, L. Xu, A method of improving credit evaluation with support vector machines, in: 2015 11th International Conference on Natural Computation, ICNC, 2015, pp. 615–619.

Zhao, 2015, Investigation and improvement of multi-layer perceptron neural networks for credit scoring, Expert Syst. Appl., 42, 3508, 10.1016/j.eswa.2014.12.006

Florez-Lopez, 2015, Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal, Expert Syst. Appl., 42, 5737, 10.1016/j.eswa.2015.02.042

Florez-Lopez, 2015, Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal, Expert Syst. Appl., 42, 5737, 10.1016/j.eswa.2015.02.042

M. Aláraj, M. Abbod, A systematic credit scoring model based on heterogeneous classifier ensembles, in: 2015 International Symposium on Innovations in Intelligent SysTems and Applications, INISTA, 2015, pp. 1–7.

Aláraj, 2016, Classifiers consensus system approach for credit scoring, Knowl.-Based Syst., 104, 89, 10.1016/j.knosys.2016.04.013

Yu, 2016, A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment, Flex. Serv. Manuf. J., 28, 576, 10.1007/s10696-015-9226-2

Xiao, 2016, Ensemble classification based on supervised clustering for credit scoring, Appl. Soft Comput., 43, 73, 10.1016/j.asoc.2016.02.022

Bequé, 2017, Extreme learning machines for credit scoring: An empirical evaluation, Expert Syst. Appl., 86, 42, 10.1016/j.eswa.2017.05.050

A. Lawi, F. Aziz, S. Syarif, Ensemble gradientboost for increasing classification accuracy of credit scoring, in: 2017 4th International Conference on Computer Applications and Information Processing Technology, CAIPT, 2017, pp. 1–4.

Y. Li, X. Lin, X. Wang, F. Shen, Z. Gong, Credit risk assessment algorithm using deep neural networks with clustering and merging, in: 2017 13th International Conference on Computational Intelligence and Security, CIS, 2017, pp. 173–176.

Li, 2017, Reject inference in credit scoring using semi-supervised support vector machines, Expert Syst. Appl., 74, 105, 10.1016/j.eswa.2017.01.011

O.J. Okesola, K.O. Okokpujie, A.A. Adewale, S.N. John, O. Omoruyi, An improved bank credit scoring model: A Naïve Bayesian approach, in: 2017 International Conference on Computational Science and Computational Intelligence, CSCI, 2017, pp. 228–233.

H. Chen, M. Jiang, X. Wang, Bayesian ensemble assessment for credit scoring, in: 2017 4th International Conference on Industrial Economics System and Industrial Security Engineering, IEIS, 2017, pp. 1–5.

Abellán, 2017, A comparative study on base classifiers in ensemble methods for credit scoring, Expert Syst. Appl., 73, 1, 10.1016/j.eswa.2016.12.020

Vanderheyden, 2018

Martey Addo, 2018, Credit risk analysis using machine and deep learning models, Risks, 6, 38, 10.3390/risks6020038

Xia, 2018, A novel heterogeneous ensemble credit scoring model based on bstacking approach, Expert Syst. Appl., 93, 182, 10.1016/j.eswa.2017.10.022

Chang, 2018, Application of extreme gradient boosting trees in the construction of credit risk assessment models for financial institutions, Appl. Soft Comput., 73, 10.1016/j.asoc.2018.09.029

Li, 2018, Heterogeneous ensemble for default prediction of peer-to-peer lending in China, IEEE Access, 6, 54396, 10.1109/ACCESS.2018.2810864

Cao, 2018, Performance evaluation of machine learning approaches for credit scoring, Int. J. Econ. Finance Manag. Sci., 6, 255

Basel Committee on Banking Supervision, 2006, Basel II: International convergence of capital measurement and capital standards: A revised framework - comprehensive version, bank for international settlements, BIS