A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring

Expert Systems with Applications - Tập 78 - Trang 225-241 - 2017
Yufei Xia1, Chuanzhe Liu1, Yuying Li2, Nana Liu1
1School of Management, China University of Mining and Technology, Xuzhou, Jiangsu 221116, PR China
2School of Foreign Studies, China University of Mining and Technology, Xuzhou, Jiangsu 221116, PR China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ala'raj, 2016, Classifiers consensus system approach for credit scoring, Knowledge-Based Systems, 104, 89, 10.1016/j.knosys.2016.04.013

Ala'raj, 2016, A new hybrid ensemble credit scoring model based on classifiers consensus system approach, Expert Systems with Applications, 64, 36, 10.1016/j.eswa.2016.07.017

Altman, 1968, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 23, 589, 10.1111/j.1540-6261.1968.tb00843.x

Baesens, 2003, Benchmarking state-of-the-art classification algorithms for credit scoring, Journal of the Operational Research Society, 54, 627, 10.1057/palgrave.jors.2601545

Bauer, 1999, An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Machine Learning, 36, 105, 10.1023/A:1007515423169

Bergstra, 2012, Random search for hyper-parameter optimization, Journal of Machine Learning Research, 13, 281

Bergstra, 2013, Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms, 13, 10.25080/Majora-8b375195-003

Bergstra, 2011, Algorithms for hyper-parameter optimization, 2546

Bishop, 2006

Breiman, 1996, Bagging predictors, Machine Learning, 24, 123, 10.1007/BF00058655

Breiman, 2001, Random forests, Machine Learning, 45, 5, 10.1023/A:1010933404324

Breiman, 1984

Brillante, 2015, Investigating the use of gradient boosting machine, random forest and their ensemble to predict skin flavonoid content from berry physical–mechanical characteristics in wine grapes, Computers and Electronics in Agriculture, 117, 186, 10.1016/j.compag.2015.07.017

Brown, 2012, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39, 3446, 10.1016/j.eswa.2011.09.033

Chang, 2011, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2, 27

Chen, 2015, Measuring the curse of dimensionality and its effects on particle swarm optimization and differential evolution, Applied Intelligence, 42, 514, 10.1007/s10489-014-0613-2

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. arXiv preprint arXiv:1603.02754.

Chen, T., & He, T. (2015). xgboost: EXtreme Gradient Boosting. R package version 0.4-2.

Chen, 2016, Group social capital and lending outcomes in the financial credit market: An empirical study of online peer-to-peer lending, Electronic Commerce Research and Applications, 15, 1, 10.1016/j.elerap.2015.11.003

Cortes, 1995, Support-vector networks, Machine Learning, 20, 273, 10.1007/BF00994018

Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1

Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research, 7, 1

Dietterich, 2000, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Machine learning, 40, 139, 10.1023/A:1007607513941

Duin, 2000, Experiments with classifier combining rules, 16

Elith, 2008, A working guide to boosted regression trees, Journal of Animal Ecology, 77, 802, 10.1111/j.1365-2656.2008.01390.x

Finlay, 2011, Multiple classifier architectures and their application to credit risk assessment, European Journal of Operational Research, 210, 368, 10.1016/j.ejor.2010.09.029

Florez-Lopez, 2015, Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal, Expert Systems with Applications, 42, 5737, 10.1016/j.eswa.2015.02.042

Freund, 1995, A desicion-theoretic generalization of on-line learning and an application to boosting, 23

Friedman, 2001, Greedy function approximation: A gradient boosting machine, Annals of statistics, 29, 1189, 10.1214/aos/1013203451

Friedman, 2002, Stochastic gradient boosting, Computational Statistics & Data Analysis, 38, 367, 10.1016/S0167-9473(01)00065-2

Guelman, 2012, Gradient boosting trees for auto insurance loss cost modeling and prediction, Expert Systems with Applications, 39, 3659, 10.1016/j.eswa.2011.09.058

Guo, 2016, Instance-based credit risk assessment for investment decisions in P2P lending, European Journal of Operational Research, 249, 417, 10.1016/j.ejor.2015.05.050

Hand, 2006, Classifier technology and the illusion of progress, Statistical science, 21, 1, 10.1214/088342306000000060

Hand, 2009, Measuring classifier performance: A coherent alternative to the area under the ROC curve, Machine learning, 77, 103, 10.1007/s10994-009-5119-5

Hand, 1997, Statistical classification methods in consumer credit scoring: A review, Journal of the Royal Statistical Society: Series A (Statistics in Society), 160, 523, 10.1111/j.1467-985X.1997.00078.x

Harris, 2015, Credit scoring using the clustered support vector machine, Expert Systems with Applications, 42, 741, 10.1016/j.eswa.2014.08.029

Hastie, 2009, Boosting and additive trees, 337

Huang, 2007, Credit scoring with a data mining approach based on support vector machines, Expert Systems with Applications, 33, 847, 10.1016/j.eswa.2006.07.007

Hutter, 2011, Sequential model-based optimization for general algorithm configuration, 507

Johnson, 2014, Learning nonlinear functions using regularized greedy forest, IEEE transactions on pattern analysis and machine intelligence, 36, 942, 10.1109/TPAMI.2013.159

Lee, 2006, Mining the customer credit using classification and regression tree and multivariate adaptive regression splines, Computational Statistics & Data Analysis, 50, 1113, 10.1016/j.csda.2004.11.006

Lessmann, 2015, Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, European Journal of Operational Research, 247, 124, 10.1016/j.ejor.2015.05.030

Lin, 2008, Particle swarm optimization for parameter determination and feature selection of support vector machines, Expert Systems with Applications, 35, 1817, 10.1016/j.eswa.2007.08.088

Min, 2005, Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters, Expert Systems with Applications, 28, 603, 10.1016/j.eswa.2004.12.008

Nanni, 2009, An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring, Expert Systems with Applications, 36, 3028, 10.1016/j.eswa.2008.01.018

Nascimento, 2014, Integrating complementary techniques for promoting diversity in classifier ensembles: A systematic study, Neurocomputing, 138, 347, 10.1016/j.neucom.2014.01.027

Nie, 2011, Credit card churn forecasting by logistic regression and decision tree, Expert Systems with Applications, 38, 15273, 10.1016/j.eswa.2011.06.028

Pötzsch, 2010, The role of soft information in trust building: Evidence from online social lending, 381

Paleologo, 2010, Subagging for credit scoring models, European Journal of Operational Research, 201, 490, 10.1016/j.ejor.2009.03.008

Pedregosa, 2011, Scikit-learn: Machine learning in Python, Journal of Machine learning research, 12, 2825

Serrano-Cinca, 2016, The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending, Decision Support Systems, 89, 113, 10.1016/j.dss.2016.06.014

Simon, 2008

Snoek, 2012, Practical bayesian optimization of machine learning algorithms, 2951

Sutton, 2005, 11-classification and regression trees, bagging, and boosting, Handbook of Statistics, 24, 303, 10.1016/S0169-7161(04)24011-1

Thornton, 2013, Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, 847

Tsai, 2014, A comparative study of classifier ensembles for bankruptcy prediction, Applied Soft Computing, 24, 977, 10.1016/j.asoc.2014.08.047

Twala, 2010, Multiple classifier application to credit risk assessment, Expert Systems with Applications, 37, 3326, 10.1016/j.eswa.2009.10.018

Wang, 2011, A comparative assessment of ensemble learning for credit scoring, Expert Systems with Applications, 38, 223, 10.1016/j.eswa.2010.06.048

Wang, 2012, Two credit scoring models based on dual strategy ensemble trees, Knowledge-Based Systems, 26, 61, 10.1016/j.knosys.2011.06.020

West, 2000, Neural network credit scoring models, Computers & operations research, 27, 1131, 10.1016/S0305-0548(99)00149-5

Wiginton, 1980, A note on the comparison of logit and discriminant models of consumer credit behavior, Journal of Financial and Quantitative Analysis, 15, 757, 10.2307/2330408

Wolpert, 1997, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation, 1, 67, 10.1109/4235.585893

Wu, 2012, Credit risk assessment and decision making by a fusion approach, Knowledge-Based Systems, 35, 102, 10.1016/j.knosys.2012.04.025

Yeh, 2012, A hybrid KMV model, random forests and rough set theory approach for credit rating, Knowledge-Based Systems, 33, 166, 10.1016/j.knosys.2012.04.004

Zhang, 2015, A gradient boosting method to improve travel time prediction, Transportation Research Part C: Emerging Technologies, 58, 308, 10.1016/j.trc.2015.02.019

Zhang, 2016, Research on Credit Scoring by Fusing Social Media Information in Online Peer-to-Peer Lending, Procedia Computer Science, 91, 168, 10.1016/j.procs.2016.07.055

Zięba, 2016, Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction, Expert Systems with Applications, 58, 93, 10.1016/j.eswa.2016.04.001