Label propagation based semi-supervised learning for software defect prediction

Automated Software Engineering - Tập 24 Số 1 - Trang 47-69 - 2017
Zhiwu Zhang1, Xiao‐Yuan Jing1, Tiejian Wang2
1School of Computer, Nanjing University of Posts and Telecommunications, Nanjing, 210003, People’s Republic of China
2State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, 430072, People’s Republic of China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter. 6(1), 20–29 (2004)

Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(11), 2399–2434 (2006)

Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009a)

Catal, C., Diri, B.: Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction. Expert Syst. 26(5), 458–471 (2009b)

Catal, C.: A comparison of semi-supervised classification approaches for software defect prediction. J. Intell. Syst. 23(1), 75–82 (2014)

Chan, Y., Walmsley, R.P.: Learning and understanding the Kruskal-Wallis one-way analysis-of-variance-by-ranks test for differences among three or more independent groups. Phys. Ther. 77(12), 1755–1761 (1997)

Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp. 57–64 (2005)

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artifici. Intell. Res. 16, 321–357 (2002)

Culp, M., Michailidis, G.: Graph-based semisupervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 174–179 (2008)

Fenton, N., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng. 26(8), 797–814 (2000)

Gao, K., Khoshgoftaar, T. M.: Software defect prediction for high-dimensional and class-imbalanced data. In: Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering, pp. 89–94 (2011)

Gao, K., Khoshgoftaar, T.M., Wald, R.: The use of under- and oversampling within ensemble feature selection and classification for software quality prediction. Int. J. Reliab. Qual. Saf. Eng. 21(1), 145004 (2014)

Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning, pp. 327–334 (2000)

Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in neural information processing systems, pp. 529–536 (2004)

Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: The misuse of the NASA metrics data program data sets for automated software defect prediction. In: Proceedings of 15th Annual Conference on Evaluation and Assessment in Software Engineering, pp. 96–103 (2011)

Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2012)

He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2005)

Jiang, Y., Li, M., Zhou, Z.H.: Software defect detection with ROCUS. J. Comput. Sci. Technol. 26(2), 328–342 (2011)

Jing, X. Y., Ying, S., Zhang, Z. W., Wu, S. S., Liu, J.: Dictionary learning based software defect prediction. In: Proceedings of the 36th International Conference on Software Engineering, pp. 414-423 (2014a)

Jing, X. Y., Zhang, Z. W., Ying, S., Wang, F., Zhu, Y. P.: Software defect prediction based on collaborative representation classification. In: Companion Proceedings of the 36th International Conference on Software Engineering, pp. 632–633 (2014b)

Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, pp 200–209 (1999)

Khoshgoftaar, T. M., Gao, K., Seliya, N.: Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of the 22nd IEEE International Conference on Tools with Artificial Intelligence, pp. 137–144 (2010)

Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th International Conference on Machine Learning, pp 179–186 (1997)

Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015)

Li, M., Zhang, H., Wu, R., Zhou, Z.H.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)

Li, S., Fu, Y.: Low-rank coding with b-matching constraint for semi-supervised classification. In: Proceedings of the 23th International Joint Conference on Artificial Intelligence, pp. 1472–1478 (2013)

Lu, H., Cukic, B., Culp, M.: An iterative semi-supervised approach to software fault prediction. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Article 15) (2011)

Lu, H., Cukic, B., Culp, M.: Software defect prediction using semi-supervised learning with dimension reduction. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 314–317 (2012)

Lyu, M. R.: Software reliability engineering: a roadmap. In: 2007 Future of Software Engineering, pp. 153–170 (2007)

McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)

Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)

Miller, D. J., Uyar, H. S.: A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Advances in neural information processing systems, pp. 571–577 (1997)

Nam, J., Pan, S. J., Kim, S.: Transfer defect learning. In: Proceedings of the 35th International Conference on Software Engineering, pp. 382–391 (2013)

Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

Pelayo, L, Dick, S.: Applying novel resampling strategies to software defect prediction. In: Proceedings of the 2007 Annual Meeting of the North American Fuzzy Information Processing Society, pp. 69–72 (2007)

Seliya, N., Khoshgoftaar, T.M.: Software quality estimation with limited fault data: a semi-supervised learning perspective. Softw. Qual. J. 15(3), 327–344 (2007a)

Seliya, N., Khoshgoftaar, T.M.: Software quality analysis of unlabeled program modules with semisupervised clustering. IEEE Trans. Syst. Man. Cyber. 37(2), 201–211 (2007b)

Shahshahani, B.M., Landgrebe, D.: The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sens. 32(5), 1087–1095 (1994)

Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39(9), 1208–1215 (2013)

Sun, Z.B., Song, Q.B., Zhu, X.Y.: Using coding based ensemble learning to improve software defect prediction. IEEE Trans. Syst. Man Cyber. C 42(6), 1806–1817 (2012)

Turhan, B., Menzies, T., Bener, A.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)

Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 20(1), 55–67 (2008)

Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)

Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

Xu, J., Man, H.: Dictionary learning based on laplacian score in sparse coding. In: Machine Learning and Data Mining in Pattern Recognition, pp.253–264 (2011)

Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)

Zhou, Z.-H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

Zhou, Z.-H., Li, M.: Semi-supervised regression with co-training style algorithms. IEEE Trans. Knowl. Data Eng. 19(11), 1479–1493 (2007)

Zhu, X.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University (2005)

Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)

Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)