Integration of unsupervised and supervised machine learning algorithms for credit risk assessment

Expert Systems with Applications - Tập 128 - Trang 301-315 - 2019
Wang Bao1, Ning Lianju1, Kong Yue2
1School of Economics and Management, Beijing University of Posts and Telecommunications, P.O. Box 164 10, Xitucheng Road, Haidian District, Beijing 100876, PR China
2State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, P.O. Box 53, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing 100029, PR China

Tài liệu tham khảo

Abellán, 2017, A comparative study on base classifiers in ensemble methods for credit scoring, Expert Systems with Applications, 73, 1, 10.1016/j.eswa.2016.12.020 AghaeiRad, 2017, Improve credit scoring using transfer of learned knowledge from self-organizing map, Neural Computing and Applications, 28, 1329, 10.1007/s00521-016-2567-2 Ala, 2015, A systematic credit scoring model based on heterogeneous classifier ensembles Ala, 2016, A new hybrid ensemble credit scoring model based on classifiers consensus system approach, Expert Systems with Applications, 64, 36, 10.1016/j.eswa.2016.07.017 Ala, 2016, Classifiers consensus system approach for credit scoring, Knowledge-Based Systems, 104, 89, 10.1016/j.knosys.2016.04.013 Altman, 1968, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 23, 589, 10.1111/j.1540-6261.1968.tb00843.x Asgharbeygi, 2008, Geodesic K-means clustering Australian-dataset. (1987). Australian dataset. (1987). Australian credit approval data. http://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approvl%29/. Last Checked on 17 Oct. 2018. Ben-david, 2009, Accuracy of machine learning models versus “hand crafted” expert systems – A credit scoring case study, Expert Systems with Applications, 36, 5264, 10.1016/j.eswa.2008.06.071 Bequé, 2017, Extreme learning machines for credit scoring : An empirical evaluation, Expert Systems with Applications, 86, 42, 10.1016/j.eswa.2017.05.050 Bijak, 2012, Does segmentation always improve model performance in credit scoring ?, Expert Systems with Applications, 39, 2433, 10.1016/j.eswa.2011.08.093 Bishop, 1997, Neural networks for pattern recognition, Journal of the American Statistical Association, 92, 1642, 10.2307/2965437 Breiman, 1999, Random forest, Machine Learning, 45, 1 Chen, 2016, Financial credit risk assessment: A recent review, Artificial Intelligence Review, 45, 1, 10.1007/s10462-015-9434-x Cleofas-Sánchez, 2016, Financial distress prediction using the hybrid associative memory with translation, Applied Soft Computing, 44, 144, 10.1016/j.asoc.2016.04.005 Cortes, 1995, Support vector machine, Machine Learning, 1303 Cover, 1967, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13, 21, 10.1109/TIT.1967.1053964 Dahiya, 2017, A feature selection enabled hybrid ‐ bagging algorithm for credit risk evaluation, Expert Systems, 34, e12217, 10.1111/exsy.12217 Ester, 1996, A density-based algorithm for discovering clusters in large spatial databases with noise, International Conference on Knowledge Discovery and Data Mining, 226 Fawcett, 2006, An introduction to ROC analysis, Pattern Recognition Letters, 27, 861, 10.1016/j.patrec.2005.10.010 Fernandes, 2008, Use of self-organizing maps and molecular descriptors to predict the cytotoxic activity of sesquiterpene lactones, European Journal of Medicinal Chemistry, 43, 2197, 10.1016/j.ejmech.2008.01.003 Florez-Lopez, 2010, Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data, Journal of the Operational Research Society, 61, 486, 10.1057/jors.2009.66 Florez-lopez, 2015, Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal, Expert Systems with Applications, 42, 5737, 10.1016/j.eswa.2015.02.042 Friedman, 2001, Greedy function approximation: A gradient boosting machine, Annals of Statistics, 29, 1189, 10.1214/aos/1013203451 García-Laencina, 2010, Pattern classification with missing data: A review, Neural Computing and Applications, 19, 263, 10.1007/s00521-009-0295-6 García, 2012, On the use of data filtering techniques for credit risk prediction with instance-based models, Expert Systems with Applications, 39, 13267, 10.1016/j.eswa.2012.05.075 German-dataset. (1994). German dataset. (1994). German cash loan data. http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/. Last Checked on 17 Oct. 2018. Guyon, 2011, An introduction to variable and feature selection, Journal of Machine Learning Research, 3, 1157 Henley, 1996, A k-nearest-neighbour classifier for assessing consumer credit risk, Journal of the Royal Statistical Society, 45, 77 Holmes, 2002, A probabilistic nearest neighbour method for statistical pattern recognition, Journal of the Royal Statistical Society, 64, 295, 10.1111/1467-9868.00338 Huysmans, 2006, Failure prediction with self organizing maps, Expert Systems with Applications, 30, 479, 10.1016/j.eswa.2005.10.005 Islam, 2007, Investigating the performance of Naive- Bayes classifiers and K- nearest neighbor classifiers Kodinariya, 2013, Review on determining number of cluster in k-means clustering, International Journal of Advance Research in Computer Science and Management Studies, 1, 90 Kohonen, 1998, The self-organizing map, Neurocomputing, 21, 1, 10.1016/S0925-2312(98)00030-7 Kong, 2017, QSAR models for predicting the bioactivity of Polo-like Kinase 1 inhibitors, Chemometrics and Intelligent Laboratory Systems, 167, 214, 10.1016/j.chemolab.2017.06.011 Lamrous, 2007, Divisive hierarchical K-means Lee, 2006, Mining the customer credit using classification and regression tree and multivariate adaptive regression splines, Computational Statistics & Data Analysi, 50, 1113, 10.1016/j.csda.2004.11.006 Lessmann, 2015, Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, European Journal of Operational Research, 247, 124, 10.1016/j.ejor.2015.05.030 Liu, 2008, Isolation forest Luo, 2009, Prediction model building with clustering-launched classification and support vector machines in credit scoring, Expert Systems with Applications, 36, 7562, 10.1016/j.eswa.2008.09.028 Luo, 2016, Spline based survival model for credit risk modeling, European Journal of Operational Research, 253, 869, 10.1016/j.ejor.2016.02.050 Malekipirbazari, 2015, Risk assessment in social lending via random forests, Expert Systems with Application, 42, 4621, 10.1016/j.eswa.2015.02.001 Malhotra, 2003, Evaluating consumer loans using neural networks, Omega, 31, 8396, 10.1016/S0305-0483(03)00016-1 Marqués, 2012, Exploring the behaviour of base classifiers in credit scoring ensembles, Expert Systems with Applications, 39, 10244, 10.1016/j.eswa.2012.02.092 Pedregosa, 2011, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825 Powers, 2011, Evaluation: From precision, recall and F-measure to Roc, informedness, markedness & correlation, Journal of Machine Learning Technologies, 2, 37 Quinlan, 1993 Ruppert, 2004, The elements of statistical learning: Data mining, inference, and prediction, Journal of the American Statistical Association, 99, 567, 10.1198/jasa.2004.s339 Saeys, 2007, A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 2507, 10.1093/bioinformatics/btm344 Sohn, 2016, Technology credit scoring model with fuzzy logistic regression, Applied Soft Computing Journal, 43, 150, 10.1016/j.asoc.2016.02.025 SONNIA, version 4.2; Molecular Networks GmbH: Germany and Altamira, LLC, USA, (2016).; https://www.mn-am.com/products/sonnia. Sun, 2015, Dynamic credit scoring using B & B with incremental-SVM-ensemble, Kybernetes, 44, 518, 10.1108/K-02-2014-0036 Thomas, 2005, A survey of the issues in consumer credit modelling research, Journal of the Operational Research Society, 56, 1006, 10.1057/palgrave.jors.2602018 Twala, 2010, Multiple classifier application to credit risk assessment, Expert Systems With Applications, 37, 3326, 10.1016/j.eswa.2009.10.018 Twala, 2013, Impact of noise on credit risk prediction : Does data quality really matter?, Intelligent Data Analysis, 17, 1115, 10.3233/IDA-130623 Wang, 2011, A comparative assessment of ensemble learning for credit scoring, Expert Systems with Applications, 38, 223, 10.1016/j.eswa.2010.06.048 Wang, 2012, Two credit scoring models based on dual strategy ensemble trees, Knowledge-Based Systems, 26, 61, 10.1016/j.knosys.2011.06.020 West, 2000, Neural network credit scoring models, Computers and Operations Research, 27, 1131, 10.1016/S0305-0548(99)00149-5 Wiginton, 1980, A note on the comparison of logit and discriminant models of consumer credit behavior, The Journal of Financial and Quantitative Analysis, 15, 757, 10.2307/2330408 Xia, 2018, A novel heterogeneous ensemble credit scoring model based on bstacking approach, Expert Systems with Applications, 93, 182, 10.1016/j.eswa.2017.10.022 Xu, 2009, Credit scoring algorithm based on link analysis ranking with support vector machine, Expert Systems with Applications, 36, 2625, 10.1016/j.eswa.2008.01.024 Yan, 2013, Classification of Aurora kinase inhibitors by self-organizing map (SOM) and support vector machine (SVM), European Journal of Medicinal Chemistry, 61, 73, 10.1016/j.ejmech.2012.06.037 Yu, 2010, Support vector machine based multiagent ensemble learning for credit risk evaluation, Expert Systems with Applications, 37, 1351, 10.1016/j.eswa.2009.06.083 Zhou, 2010, Least squares support vector machines ensemble models for credit scoring, Expert Systems with Applications, 37, 127, 10.1016/j.eswa.2009.05.024 Zhou, 2017, Predicting the listing statuses of Chinese-listed companies using decision trees combined with an improved filter feature selection method, Knowledge-Based Systems, 128, 93, 10.1016/j.knosys.2017.05.003 Zhou, 2016, Predicting the listing status of Chinese listed companies with multi-class classification models, Information Sciences, 328, 222, 10.1016/j.ins.2015.08.036