Combined classifier for cross-project defect prediction: an extended empirical study

Yun Zhang1, David Lo2, Xin Xia1, Jun Sun1
1College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
2School of Information Systems, Singapore Management University, Singapore, 641674, Singapore

Tóm tắt

Từ khóa


Tài liệu tham khảo

Zimmermann T, Nagappan N. Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 531–540

Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 2010, 17(4): 375–407

Jiang T, Tan L, Kim S. Personalized defect prediction. In: Proceedings of the 28th IEEE International Conference on Automated Software Engineering. 2013, 279–289

Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Transactions on Software Engineering, 2008, 34(4): 485–496

Liu Y, Khoshgoftaar T M, Seliya N. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Transactions on Software Engineering, 2010, 36(6): 852–864

D’Ambros M, Lanza M, Robbes R. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering, 2012, 17(4–5): 531–577

Nagappan N, Ball T, Zeller A. Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 452–461

Kitchenham B A, Mendes E, Travassos G H. Cross versus withincompany cost estimation studies: a systematic review. IEEE Transactions on Software Engineering, 2007, 33(5): 316–329

Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Crossproject defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the European Software Engineering Conference and the ACM Sigsoft International Symposium on Foundations of Software Engineering. 2009, 91–100

Turhan B, Menzies T, Bener A B, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009, 14(5): 540–578

Panichella A, Oliveto R, De Lucia A. Cross-project defect prediction models: L’union fait la force. In: Proceedings of Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering. 2014, 164–173

Arisholm E, Briand L C, Fuglerud M. Data mining techniques for building fault-proneness models in telecom java software. In: Proceedings of the 18th IEEE International Symposium on Software Reliability. 2007, 215–224

Rahman F, Devanbu P. How, and why, process metrics are better. In: Proceedings of the International Conference on Software Engineering. 2013, 432–441

Rahman F, Posnett D, Devanbu P. Recalling the imprecision of crossproject defect prediction. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering. 2012, 61

Rahman F, Posnett D, Herraiz I, Devanbu P. Sample size vs. bias in defect prediction. In: Proceedings of the 9th Joint Meeting on Foundations of Software Engineering. 2013, 147–157

Peters F, Menzies T, Marcus A. Better cross company defect prediction. In: Proceedings of the 10th IEEEWorking Conference on Mining Software Repositories. 2013, 409–418

Nam J, Pan S J, Kim S. Transfer defect learning. In: Proceedings of the International Conference on Software Engineering. 2013, 382–391

Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Multi-objective cross-project defect prediction. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 252–261

Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. Vol 463. New York: ACM Press, 1999

Schütze H. Introduction to information retrieval. In: Proceedings of the International Communication of Association for Computing Machinery Conference. 2008

Zhang Y, Lo D, Xia X, Sun J. An empirical study of classifier combination for cross-project defect prediction. In: Proceedings of the 39th IEEE Annual Computer Software and Applications Conference. 2015, 264–269

Chidamber S R, Kemerer C F. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 1994, 20(6): 476–493

Henderson-Sellers B. Object-Oriented Metrics, Measures of Complexity. Upper Saddle River, NJ: Prentice Hall, 1996

Bansiya J, Davis C G. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering, 2002, 28(1): 4–17

Tang M, Kao M, Chen M. An empirical study on object-oriented metrics. In: Proceedings of the 6th International Symposium on Software Metrics. 1999, 242–249

Martin R. OO design quality metrics — an analysis of dependencies. IEEE Transactions on Software Engineering, 1994, 20(6): 476–493

McCabe T. A complexity measure. IEEE Transactions on Software Engineering, 1976, 2(4): 308–320

Jureczko M, Madeyski L. Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering. 2010

Basili V R, Briand L C, Melo W L. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering, 1996, 22(10): 751–761

Gyimothy T, Ferenc R, Siket I. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software Engineering, 2005, 31(10): 897–910

Bezerra M E, Oliveira A L, Meira S R. A constructive rbf neural network for estimating the probability of defects in software modules. In: Proceedings of the International Joint Conference on Neural Networks. 2007, 2869–2874

Ceylan E, Kutlubay F O, Bener A B. Software defect identification using machine learning techniques. In: Proceedings of the 32nd EUROMICRO Conference on Software Engineering and Advanced Applications. 2006, 240–247

Okutan A, Yıldız O T. Software defect prediction using bayesian networks. Empirical Software Engineering, 2014, 19(1): 154–181

Nelson A, Menzies T, Gay G. Sharing experiments using open-source software. Software: Practice and Experience, 2011, 41(3): 283–305

Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. San Francisco: Morgan Kaufmann, 2006

Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009

Buhmann M D. Radial basis functions. Acta Numerica, 2000, 9: 1–38

Quinlan J R. Bagging, boosting, and c4. 5. In: Proceedings of the 13th National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference. 1996, 725–730

Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10–18

Wu R, Zhang H, Kim S, Cheung S C. Relink: recovering links between bugs and changes. In: Proceedings of the ACM Sigsoft Symposium and the European Conference on Foundations of Software Engineering. 2011, 15–25

Tian Y, Lawall J, Lo D. Identifying linux bug fixing patches. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 386–396

Rao S, Kak A. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th Working Conference on Mining Software Repositories. 2011, 43–52

Kim S, Whitehead E J, Zhang Y. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering, 2008, 34(2): 181–196

Chidamber S R, Kemerer C F. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 1994, 20(6): 476–493

Moser R, Pedrycz W, Succi G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th ACM/IEEE International Conference on Software Engineering. 2008, 181–190

D’Ambros M, Lanza M, Robbes R. An extensive comparison of bug prediction approaches. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories. 2010, 31–41

Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Information and Software Technology, 2012, 54(3): 248–256

Xia X, Lo D, Wang X, Yang X. Collective personalized change classification with multiobjective search. IEEE Transactions on Reliability, 2016, 65: 1–20

Yang X, Lo D, Xia X, Zhang Y. Deep learning for just-in-time defect prediction. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security. 2015, 17–26

Xuan X, Lo D, Xia X, Tian Y. Evaluating defect prediction approaches using a massive set of metrics: an empirical study. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. 2015, 1644–1647

Nelson A, Menzies T, Gay G. Sharing experiments using open-source software. Software: Practice and Experience, 2011, 41(3): 283–305

Xia X, Lo D, Pan S J, Nagappan N, Wang X. Hydra: massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering, 2016, 42: 977–998