A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment

Applied Soft Computing - Tập 86 - Trang 105936 - 2020
Nisha Arora1, Pankaj Deep Kaur1
1Department of Computer Science and Engineering, Guru Nanak Dev University, Regional Campus, Jalandhar, India

Tóm tắt

Từ khóa


Tài liệu tham khảo

https://www.capitaline.com.

https://www.federalreserve.gov/releases/chargeoff/delallsa.htm.

Oreski, 2013, Genetic algorithm based heuristic for feature selection in credit risk assessment. sciencedirect, Expert Syst. Appl.

Dahiya, 2017, A feature selection enabled hybrid-bagging algorithm for credit risk evaluation, Expert Syst., 34, 10.1111/exsy.12217

D. Wang, Z. Zhang, A hybrid System with filter approach and multiple population Genetic Algorithm for feature selection in Credit Scoring. Science direct, J. Comput. Appl. Math. http://dx.doi.org/10.1016/j.cam.2017.04.036.

Chandrashekar, 2014

Dehuri, 2013, Revisiting evolutionary algorithms in feature selection and nonfuzzy/fuzzy rule based classification, WIRE Data Mining Knowl. Discov., 3, 83, 10.1002/widm.1087

Cai, 2018, Feature selection in machine learning: A new perspective, Neuocomputing, 10.1016/j.neucom.2017.11.077

Yue Zhang, Weihong Guo, Soumya Ray, On the consistency of Feature Selection with Lasso for Non-Linear Targets, in: Proceedings of The 33rd International Conference on Machine Learning, PMLR, vol. 48, 2016, pp. 183–191.

Huang, 2018, Enterprise credit risk evaluation based on neural network algorithm, Cogn. Syst. Res., 10.1016/j.cogsys.2018.07.023

Pandey, 2017, Credit risk analysis using machine learning classifiers

Lessmann, 2015, Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, European J. Oper. Res., 247, 124, 10.1016/j.ejor.2015.05.030

Lin, 2012, Machine learning in financial crisis prediction: A survey, IEEE Trans. Syst. Man Cybern. C, 42, 421, 10.1109/TSMCC.2011.2170420

Kruppa, 2013, Consumer credit risk: Individual probability estimates using machine learning, Expert Syst. Appl., 40, 5125, 10.1016/j.eswa.2013.03.019

Malekipirbazari, 2015, Risk assessment in social lending via random forests, Expert Syst. Appl., 42, 4621, 10.1016/j.eswa.2015.02.001

Shi, 2011, Credit assessment with random forests, 24

Behr, 2016, Default patterns in seven EU countries: A random forest approach, Int. J. Econ. Bus., 24, 181, 10.1080/13571516.2016.1252532

Bingamawa, 2016

Antonakis, 2009, Assessing Naïve Bayes as a method for screening credit applicants, J. Appl. Stat., 5, 537, 10.1080/02664760802554263

Yeh, 2009, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Expert Syst. Appl., 36, 2473, 10.1016/j.eswa.2007.12.020

Danenas, 2011, Credit risk evaluation model development using support vector based classifiers, Procedia Comput. Sci., 4, 1699, 10.1016/j.procs.2011.04.184

Danenas, 2015, Selection of support vector machines based classifiers for credit risk domain, Expert Syst. Appl., 42, 3194, 10.1016/j.eswa.2014.12.001

Sivasankar, 2017, A study of dimensionality reduction techniques with machine learning methods for credit risk prediction, vol. 556

Jiang, 2018, Stationary Mahalanobis kernel SVM for credit risk evaluation, Appl. Soft Comput., 71, 10.1016/j.asoc.2018.07.005

Henley, 1997, Construction of a k-nearest-neighbour credit-scoring system, IMA J. Manag. Math., 8, 305, 10.1093/imaman/8.4.305

Baesens, 2003, Benchmarking state of the art classification algorithm for credit scoring, J Oper Res Soc, 54, 627, 10.1057/palgrave.jors.2601545

Li, 2009, The hybrid credit scoring model based on KNN classifier, 330

Hand, 2003, Choosing k for two-class nearest neighbor classifiers with unbalanced classes, Pattern Recognit. Lett., 24, 1555, 10.1016/S0167-8655(02)00394-X

Islam, 2007, Investigating the performance of naive- Bayes classifiers and k- nearest neighbor classifiers

Liu, 2005, Data mining feature selection for credit scoring models, J. Oper. Res. Soc., 56, 1099, 10.1057/palgrave.jors.2601976

Jadhav, 2018, Information gain directed genetic algorithm wrapper feature selection for credit rating, Appl. Soft Comput., 69, 541, 10.1016/j.asoc.2018.04.033

X. Zhang, Y. Yang, Z. Zhou, A novel credit scoring model based on optimized random forest, in: 2018 IEEE 8th Annual Computing and Communication Workshop and Conference, CCWC, Las Vegas, NV, 2018, pp. 60–65. http://dx.doi.org/10.1109/CCWC.2018.8301707.

Xiao-Ying liu, Yong Liang, et al. A Hybrid Genetic Algorithm withWrapper-embedded Approaches for Feature Selection, IEEE Access. http://dx.doi.org/10.1109/ACCESS.2018.2818682.

Khaire, 2019, Stability of feature selection algorithm: A review, J. King Saud Univ. - Comput. Inf. Sci.

Somol, 2010, Evaluating stability and comparing output of feature selectors that optimize feature subset Cardinality, IEEE Trans. Pattern Anal. Mach. Intell., 32, 1921, 10.1109/TPAMI.2010.34

L.I. Kuncheva, A stability index for feature selection, in: Proc. 25th IASTED Int’l Multi-Conf. Artificial Intelligence and Applications, 2007, pp. 421–427.

I. Kamkar, S.K. Gupta, D. Phung, S. Venkatesh, Exploiting feature relationships towards stable feature selection, in: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, 2015, pp. 1-10. http://dx.doi.org/10.1109/DSAA.2015.7344859.

Pes, 2019, Ensemble feature selection for high-dimensional data: A stability analysis across multiple domains, Neural Comput. Appl., 10.1007/s00521-019-04082-3

Turney, 1995, Technical note: Bias and the quantification of stability, Mach. Learn., 20, 23, 10.1007/BF00993473

Z. He, W. Yu, Stable feature selection for biomarker discovery, in: Computaional Biology and Discovery, Elsevier. http://dx.doi.org/10.1016/j.compbiolchem.2010.07.002.

Abeel, 2010, Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, Bioinformatics, 26, 392, 10.1093/bioinformatics/btp630

Kamkar, 2015, 9457

Li, 2015, FREL: A stable feature selection algorithm, IEEE Trans. Neural Netw. Learn. Syst., 26, 1388, 10.1109/TNNLS.2014.2341627

Han, 2010, A variance reduction framework for stable feature selection

Robert, 1996, Regression shrinkage and selection via the lasso, J. Royal. Stat. Soc. Ser. B, 58, 267, 10.1111/j.2517-6161.1996.tb02080.x

Lin, 2014, A new idea of study on the influence factors of companies’ debt costs in the big data era, Procedia Comput. Sci., 31, 532, 10.1016/j.procs.2014.05.299

Fang, 2014, Individual credit risk prediction model: Application of lasso-logistic model, J. Quant. Tech. Econ.

Hongmei Chen, Yaoxin Xiang, The study of credit scoring model based on group lasso, in: Procedia, Sciencedirect, 5th International Conference on Information Technology and quantitative Management, ITQM, 2017.

Kamkar, 2015, Stable feature selection for clinical prediction: Exploiting ICD tree structure using tree-lasso, J. Biomed. Inform., 10.1016/j.jbi.2014.11.013

Zhang, 2016, High-order covariate interacted lasso for feature selection, Pattern Recognit. Lett.

Bach, 2008

Liu, 1995, Chi2 : Feature selection and discretization of the numeric attributes, 388

Trabelsia, 2017, A new feature selection method for nominal classifier based on formal concept analysis, Procedia Comput. Sci., 6

McHugh, 2013, The chi-square test of independence, Biochemiamedica, 143

H. Dağ, K.E. Sayin, I. Yenidoğan, S. Albayrak, C. Acar, Comparison of feature selection algorithms for medical data, in: 2012 International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, 2012, pp. 1–5. http://dx.doi.org/10.1109/INISTA.2012.6247011.

Robnik-Šikonja, 2003, Mach. Learn., 53, 23, 10.1023/A:1025667309714

Liu, 2005, Data mining feature selection for credit scoring models, J. Oper. Res. Soc., 56, 1099, 10.1057/palgrave.jors.2601976

Cortes, 1995, Mach. Learn., 20, 273

Carrizosa, 2013, Supervised classification and mathematical optimization, Comput. Oper. Res., 40, 150, 10.1016/j.cor.2012.05.015

G.H. John, P. Langley, Estimating continuous distribution in bayesian classifier, in: Proceedings on 11th Conference in Uncertainty in Artificial Intelligence, 1995, pp. 338–345.

Zareapoor Masoumeh, Pourya Shamsolmoali, Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier, in: International Conference on Computer, Communication and Convergence, ICCC 2015, Procedia Computer Science http://dx.doi.org/10.1016/j.procs.2015.04.201. http://www.sciencedirect.com/science/article/pii/S1877050915007103.

Mase, 2008, Credit-rating of companies

L., 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324

https://www.lendingclub.com/info/download-data.action.

https://www.kaggle.com/zaurbegiev/my-dataset.

https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).

Core Team, 2018