A hybrid approach to software fault prediction using genetic programming and ensemble learning methods

Springer Science and Business Media LLC - Tập 13 - Trang 1746-1760 - 2022
Satya Prakash Sahu1, B. Ramachandra Reddy1,2, Dev Mukherjee1, D. M. Shyamla1, Bhim Singh Verma1
1Department of Information Technology, National Institute of Technology, Raipur, Raipur, India
2Department of Computer Science and Engineering, SRM University AP, Amaravati, India

Tóm tắt

Software fault prediction techniques use previous software metrics and also use the fault data to predict fault-prone modules for the next release of software. In this article we review the literature that uses machine-learning techniques to find the defect, fault, ambiguous code, inappropriate branching and prospected runtime errors to establish a level of quality in software. This paper also proposes a hybrid technique for software fault prediction which is based on genetic programming and ensemble learning techniques. There are multiple software fault prediction (machine-learning) techniques available to predict the occurrence of faults. Our experiments perform a comparative study of the performance achieved by simple ensemble methods, simple genetic programming based classification and the hybrid approach. We find that machine learning techniques have different learning abilities that can be exploited by software professionals and researchers for software fault prediction. We find that the performance obtained by this proposed approach is superior to the simple statistical and ensemble techniques used in the automated fault prediction system. However, more studies should be performed on lesser used machine learning techniques.

Tài liệu tham khảo

Adeli H, Hung SL (1994) Machine learning: neural networks, genetic algorithms, and fuzzy systems. Wiley (1994) Akour M, Alsmadi I, Alazzam I (2017) Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods. Int J Data Anal Tech Strateg 9(1):1–16 Aleem S, Capretz LF, Ahmed F (2015) Benchmarking machine learning techniques for software defect detection. Int J Softw Eng Appl 6(3) Arar ÖF, Ayan K (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277 Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE'07 (pp. 215–224). IEEE (2007, November) Bal PR, Mohapatra DP (2017) Software reliability prediction based on radial basis function neural network. In: Advances in computational intelligence. Springer, Singapore, pp 101–110 Bal PR, Jena N, Mohapatra DP (2017) Software reliability prediction based on ensemble models. In: Proceeding of international conference on intelligent communication, control and devices, Springer, Singapore, pp 895–902 Blickle T (1997) Theory of evolutionary algorithms and application to system synthesis (No. 17). vdf Hochschulverlag AG Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In; Proceedings of the fifth annual workshop on computational learning theory (pp 144–152). ACM Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140 Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202 Chidamber SR, Kemerer CF (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493 Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electr Eng 67:15–24 Di Martino S, Ferrucci F, Gravino C, Sarro F (2011) A genetic algorithm to configure support vector machines for predicting fault-prone components. In: International conference on product focused software process improvement (pp 247–261). Springer, Berlin Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput 6(6):1289–1301 Girija SS (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems Guo L, Cukic B, Singh H (2003) Predicting fault prone modules by the dempster-shafer belief networks. In: Proceedings of the 18th IEEE international conference on automated software engineering, 2003, pp 249–252. IEEE Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910 Jabangwe R, Börstler J, Šmite D, Wohlin C (2015) Empirical evidence on the link between object-oriented measures and external quality attributes: a systematic literature review. Empir Softw Eng 20(3):640–693 Kleinberg EM (2000) On the algorithmic implementation of stochastic discrimination. IEEE Trans Pattern Anal Mach Intell 5:473–490 Kpodjedo S, Ricca F, Galinier P, Guéhéneuc YG, Antoniol G (2011) Design evolution metrics for defect prediction in object oriented systems. Empir Softw Eng 16(1):141–175 Kulamala VK, Teja ASC, Maru A, Singla Y, Mohapatra DP (2018) Predicting software reliability using computational intelligence techniques: a review. In: 2018 international conference on information technology (ICIT), IEEE, pp 114–119 Kumar KV, Kumari P, Chatterjee A, Mohapatra DP (2021) Software fault prediction using random forests. In: Intelligent and cloud computing. Springer, Singapore, pp 95–103 Kumaresh, S., Baskaran, R., Sivaguru, M.: Software Defect Classification using Bayesian Classification Techniques. Li M, Zhang H, Wu R, Zhou ZH (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230 Maddipati SS, Pradeepini G, Yesubabu A (2018) Software defect prediction using adaptive neuro fuzzy inference system. Int J Appl Eng Res 13(1):394–397 Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518 Mitchell TM (1997) Machine learning. WCB Murillo-Morera J, Jenkins M (2015) A software defect-proneness prediction framework: a new approach using genetic algorithms to generate learning schemes. In: SEKE, pp 445–450 Purohit A, Chaudhari NS, Tiwari A (2010) Construction of classifier with feature selection based on genetic programming. In: 2010 IEEE congress on evolutionary computation (CEC) (pp 1–5). IEEE, (2010) Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106 Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234 Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327 Ridella S, Rovetta S, Zunino R (1997) Circular backpropagation networks for classification. IEEE Trans Neural Netw 8(1):84–97 Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30 Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Tech. Rep Sathyaraj R, Prabu S (2015) An approach for software fault prediction to measure the quality of different prediction methodologies using software metrics. Indian J Sci Technol 8(35) Sherer SA (1995) Software fault prediction. J Syst Softw 29(2):97–105 Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3 Song Q, Jia Z, Shepperd M, Ying S, Liu J (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370 Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. In: IEEE international conference on neural networks (Vol. 1, No. 24, pp 525–532) Stephens T (2016) Genetic Programming in Python, with a scikit-learn inspired API: gplearn, 2016–. [Online; accessed 21.6.2017] Turhan B, Bener A (2009) Analysis of Naive Bayes’ assumptions on software fault data: an empirical study. Data Knowl Eng 68(2):278–290 Twala B (2011) Software faults prediction using multiple classifiers. In: 2011 3rd international conference on computer research and development (ICCRD) (Vol. 4, pp 504–510). IEEE Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R (2008) Mining software repositories for comprehensible software fault prediction models. J Syst Softw 81(5):823–839 Zadeh LA (1996) Fuzzy logic, neural networks, and soft computing. In: Fuzzy Sets, Fuzzy logic, and fuzzy systems: selected papers by Lotfi A Zadeh (pp 775–782) Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw 83(4):660–674