A transfer cost-sensitive boosting approach for cross-project defect prediction

Software Quality Journal - Tập 25 Số 1 - Trang 235-272 - 2017
Duksan Ryu1, Jong‐In Jang1, Jongmoon Baik1
1School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

Tóm tắt

Từ khóa


Tài liệu tham khảo

Arcuri, A., & Briand, L. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 33rd International Conference on Software Engineering (ICSE) (pp. 1–10). doi: 10.1145/1985793.1985795 .

Arisholm, E., Briand, L. C., & Johannessen, E. B. (2010). A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1), 2–17. doi: 10.1016/j.jss.2009.06.055 .

Bansiya, J., & Davis, C. G. (2002). A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering, 28(1), 4–17. doi: 10.1109/32.979986 .

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE : Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

Chen, L., Fang, B., Shang, Z., & Tang, Y. (2015). Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 62, 67–77. doi: 10.1016/j.infsof.2015.01.014 .

Chidamber, S. R., & Kemerer, C. F. (1994). A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 20(6), 476–493. doi: 10.1109/32.295895 .

D’Ambros, M., Lanza, M., & Robbes, R. (2011). Evaluating defect prediction approaches: A benchmark and an extensive comparison. Empirical Software Engineering,. doi: 10.1007/s10664-011-9173-9 .

Dai, W., Yang, Q., Xue, G., & Yu, Y. (2007). Boosting for transfer learning. In Proceedings of the 24th international conference on Machine learning (pp. 193–200). http://dl.acm.org/citation.cfm?id=1273521 . Accessed February 25, 2014.

Dejaeger, K. (2013). Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers. IEEE Transactions on Software Engineering, 39(2), 237–257. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6175912 . Accessed February 25, 2014.

Eaton, E., & DesJardins, M. (2011). Selective transfer between learning tasks using task-based boosting. AAAI, 337–342. http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/viewFile/3752@misc/3915 . Accessed June 11, 2014.

Elish, K. O., & Elish, M. O. (2008). Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5), 649–660. doi: 10.1016/j.jss.2007.07.040 .

Fan, W., Stolfo, S., Zhang, J., & Chan, P. (1999). AdaCost: misclassification cost-sensitive boosting. ICML. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:AdaCost+:+Misclassification+Cost-sensitive+Boosting#0 . Accessed November 25, 2014.

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. doi: 10.1006/jcss.1997.1504 .

Grbac, T., Mausa, G., & Basic, B. (2013). Stability of Software defect prediction in relation to levels of data imbalance. SQAMIA. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.402.8978&rep=rep1&type=pdf . Accessed November 13, 2014.

Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304. doi: 10.1109/TSE.2011.103 .

Hall, M., Frank, E., & Holmes, G. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18. http://dl.acm.org/citation.cfm?id=1656278 . Accessed November 13, 2014.

He, Z., Shu, F., Yang, Y., Li, M., & Wang, Q. (2011). An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering,. doi: 10.1007/s10515-011-0090-3 .

Henderson-Sellers, B. (1995). Object-oriented metrics: measures of complexity, Prentice-Hall, Inc.

Jureczko, M., & Madeyski, L. (2010). Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th international conference on predictive models in software engineering—PROMISE ‘10, 1. doi: 10.1145/1868328.1868342 .

Jureczko, M., & Spinellis, D. (2010). Using object-oriented design metrics to predict software defects. In Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej (pp. 69–81).

Ma, Y., Luo, G., Zeng, X., & Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256. doi: 10.1016/j.infsof.2011.09.007 .

Martin, R. (1994). OO design quality metrics. An analysis of dependencies, 12, 151–170.

McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering SE, 2(4), 308–320. doi: 10.1109/TSE.1976.233837 .

Mei-Huei, T., Ming-Hung, K., & Mei-Hwa, C. (1999). An empirical study on object-oriented metrics. In Proceedings sixth international software metrics symposium (Cat. No.PR00403) (pp. 242–249). IEEE Computer Society. doi: 10.1109/METRIC.1999.809745 .

Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., & Turhan, B. (2012). The PROMISE Repository of empirical software engineering data. http://openscience.us/repo/ .

Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: A response to “Comments on ‘data mining static code attributes to learn defect predictors’”. IEEE Transactions on Software Engineering,. doi: 10.1109/TSE.2007.70721 .

Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., & Bener, A. (2010). Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering, 17(4), 375–407. doi: 10.1007/s10515-010-0069-5 .

Nam, J., Pan, S. J., & Kim, S. (2013). Transfer defect learning. In 35th International Conference on Software Engineering (ICSE) (pp. 382–391). doi: 10.1109/ICSE.2013.6606584 .

Ryu, D., Choi, O., & Baik, J. (2014). Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering. doi: 10.1007/s10664-014-9346-4 .

Shi, X., Fan, W., & Ren, J. (2008). Actively transfer domain knowledge. In Machine Learning and Knowledge Discovery in Databases, (60703110) (pp. 342–357). http://link.springer.com/chapter/10.1007/978-3-540-87481-2_23 . Accessed November 29, 2014.

Singh, Y., Kaur, A., & Malhotra, R. (2009). Empirical validation of object-oriented metrics for predicting fault proneness models. Software Quality Journal, 18(1), 3–35. doi: 10.1007/s11219-009-9079-6 .

Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Journal of School Psychology, 19, 51–56. doi: 10.1016/0022-4405(81)90007-8 .

Tomek, I. (1976). Two modifications of CNN. IEEE Transaction Systems, Man and Cybernetics, 769–772. http://ci.nii.ac.jp/naid/80013575533/ . Accessed January 26, 2015.

Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540–578. doi: 10.1007/s10664-008-9103-7 .

Turhan, B., Tosun Mısırlı, A., & Bener, A. (2013). Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 55(6), 1101–1118. doi: 10.1016/j.infsof.2012.10.003 .

Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics,. doi: 10.3102/10769986025002101 .

Wang, S., Chen, H., & Yao, X. (2010). Negative correlation learning for classification ensembles. In The 2010 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). doi: 10.1109/IJCNN.2010.5596702 .

Wang, B. X., & Japkowicz, N. (2009). Boosting support vector machines for imbalanced data sets. Knowledge and Information Systems, 25(1), 1–20. doi: 10.1007/s10115-009-0198-y .

Wang, S., & Yao, X. (2013). Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability, 62(2), 434–443. doi: 10.1109/TR.2013.2259203 .

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. http://www.jstor.org/stable/3001968 . Accessed October 14, 2014.

Yao, Y., & Doretto, G. (2010). Boosting for transfer learning with multiple sources. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, 1855–1862. doi: 10.1109/CVPR.2010.5539857 .

Zimmermann, T., Nagappan, N., Gall, H., Giger, E., & Murphy, B. (2009). Cross-project defect prediction. In Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (p. 91). doi: 10.1145/1595696.1595713 .