Default prediction in P2P lending from high-dimensional data based on machine learning

Jing Zhou1, Wei Li2,3,4, Jiaxin Wang3, Shuai Ding3,4, Chengyi Xia5,6
1Business School, Cardiff University, Cardiff CF10 3EU, UK
2School of Finance, Anhui University of Finance and Economics, Bengbu 233030, Anhui, China
3School of Management, Hefei University of Technology, Hefei 23009, Anhui, China
4Key Laboratory of Process Optimization and Intelligent Decision-Making (Ministry of Education), Hefei University of Technology, Hefei 23009, Anhui, China
5Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin 300384, China
6Key Laboratory of Computer Vision and System (Ministry of Education), Tianjin University of Technology, Tianjin 300384, China

Tài liệu tham khảo

He, 2009, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 21, 1263, 10.1109/TKDE.2008.239 Han, 2013, Orthogonal support vector machine for credit scoring, Eng. Appl. Artif. Intell., 26, 848, 10.1016/j.engappai.2012.10.005 Orsenigo, 2013, Linear versus nonlinear dimensionality reduction for banks’ credit rating prediction, Knowl.-Based Syst., 47, 14, 10.1016/j.knosys.2013.03.001 Chang, 2018, Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions, Appl. Soft Comput. J., 73, 914, 10.1016/j.asoc.2018.09.029 Soui, 2019, Rule-based credit risk assessment model using multi-objective evolutionary algorithms, Expert Syst. Appl., 126, 144, 10.1016/j.eswa.2019.01.078 Sameer, 2017, A new algorithm of modified binary particle swarm optimization based on the Gustafson-Kessel for credit risk assessment, Neural Comput. Appl., 1 Bequé, 2017, Extreme learning machines for credit scoring: An empirical evaluation, Expert Syst. Appl., 86, 42, 10.1016/j.eswa.2017.05.050 Chen, 2018, The role of punctuation in P2P lending: Evidence from China, Econ. Model., 68, 634, 10.1016/j.econmod.2017.05.007 Ma, 2018, Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning, Electron. Commer. Res. Appl., 31, 24, 10.1016/j.elerap.2018.08.002 Ignatius, 2018, A fuzzy decision support system for credit scoring, Neural Comput. Appl., 29, 921, 10.1007/s00521-016-2592-1 Liu, 2018, Fuzzy-rough instance selection combined with effective classifiers in credit scoring, Neural Process. Lett., 47, 193, 10.1007/s11063-017-9641-3 Fang, 2018, A new approach for credit scoring by directly maximizing the Kolmogorov–Smirnov statistic, Comput. Statist. Data Anal., 133, 180, 10.1016/j.csda.2018.10.004 Luo, 2017, A deep learning approach for credit scoring using credit default swaps, Eng. Appl. Artif. Intell., 65, 465, 10.1016/j.engappai.2016.12.002 Papouskova, 2019, Two-stage consumer credit risk modelling using heterogeneous ensemble learning, Decis. Support. Syst., 118, 33, 10.1016/j.dss.2019.01.002 Malekipirbazari, 2015, Risk assessment in social lending via random forests, Expert Syst. Appl., 42, 4621, 10.1016/j.eswa.2015.02.001 Zhang, 2018, Multiple instance learning for credit risk assessment with transaction data, Knowl.-Based Syst., 161, 65, 10.1016/j.knosys.2018.07.030 Lessmann, 2015, Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, European J. Oper. Res., 247, 124, 10.1016/j.ejor.2015.05.030 Li, 2019 Bu, 2019, Graph K-means based on Leader Identification, Dyn. Game Opin. Dyn., 4347 Li, 2018, Enhance the performance of network computation by a tunable weighting strategy, IEEE Trans. Emerg. Top. Comput. Intell., 2, 214, 10.1109/TETCI.2018.2829906 Zhang, 2017, An up-to-date comparison of state-of-the-art classification algorithms, Expert Syst. Appl., 82, 128, 10.1016/j.eswa.2017.04.003 Xia, 2017, Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending, Electron. Commer. Res. Appl., 24, 30, 10.1016/j.elerap.2017.06.004 Serrano-Cinca, 2016, The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending, Decis. Support. Syst., 89, 113, 10.1016/j.dss.2016.06.014 Xia, 2018, A novel heterogeneous ensemble credit scoring model based on bstacking approach, Expert Syst. Appl., 93, 182, 10.1016/j.eswa.2017.10.022 Bu, 2019, Dynamic cluster formation game for attributed graph clustering, IEEE Trans. Cybern., 49, 328, 10.1109/TCYB.2017.2772880 Bu, 2018, GLEAM : a graph clustering framework based on potential game optimization for large-scale social networks, Knowl. Inf. Syst., 55, 741, 10.1007/s10115-017-1105-6 Li, 2016, Fast and accurate mining the community structure : Integrating center locating and membership optimization, IEEE Trans. Knowl. Data Eng., 28, 2349, 10.1109/TKDE.2016.2563425 Sun, 2018, Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates, Inf. Sci. (Ny), 425, 76, 10.1016/j.ins.2017.10.017 Xia, 2017, A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring, Expert Syst. Appl., 78, 225, 10.1016/j.eswa.2017.02.017 Zhang, 2018, Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring, Neurocomputing., 316, 210, 10.1016/j.neucom.2018.07.070 Li, 2018, Transfer learning-based default prediction model for consumer credit in China, J. Supercomput., 75, 862, 10.1007/s11227-018-2619-8 Friedman, 2002, Stochastic gradient boosting, Comput. Statist. Data Anal., 38, 367, 10.1016/S0167-9473(01)00065-2 Sun, 2019, A novel cryptocurrency price trend forecasting model based on LightGBM, Financ. Res. Lett. Ke, 2017, LightGBM : A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 30, 3149 Strelkov, 2008, A new similarity measure for histogram comparison and its application in time series analysis, Pattern Recognit. Lett., 29, 1768, 10.1016/j.patrec.2008.05.002 Yang, 2019, An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring, Comput. Netw., 151, 166, 10.1016/j.comnet.2019.01.026 Li, 2018, Heterogeneous ensemble for default prediction of peer-to-peer lending in China, IEEE Access, 6, 54396, 10.1109/ACCESS.2018.2810864 Helbing, 2014 Xia, 2019, A new coupled disease-awareness spreading model with mass media on multiplex networks, Inform. Sci., 471, 185, 10.1016/j.ins.2018.08.050 Wang, 2018, Improved centrality indicators to characterize the nodal spreading capability in complex networks, Appl. Math. Comput., 334, 388