A hybrid approach to software fault prediction using genetic programming and ensemble learning methods
Tóm tắt
Software fault prediction techniques use previous software metrics and also use the fault data to predict fault-prone modules for the next release of software. In this article we review the literature that uses machine-learning techniques to find the defect, fault, ambiguous code, inappropriate branching and prospected runtime errors to establish a level of quality in software. This paper also proposes a hybrid technique for software fault prediction which is based on genetic programming and ensemble learning techniques. There are multiple software fault prediction (machine-learning) techniques available to predict the occurrence of faults. Our experiments perform a comparative study of the performance achieved by simple ensemble methods, simple genetic programming based classification and the hybrid approach. We find that machine learning techniques have different learning abilities that can be exploited by software professionals and researchers for software fault prediction. We find that the performance obtained by this proposed approach is superior to the simple statistical and ensemble techniques used in the automated fault prediction system. However, more studies should be performed on lesser used machine learning techniques.
Tài liệu tham khảo
Adeli H, Hung SL (1994) Machine learning: neural networks, genetic algorithms, and fuzzy systems. Wiley (1994)
Akour M, Alsmadi I, Alazzam I (2017) Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods. Int J Data Anal Tech Strateg 9(1):1–16
Aleem S, Capretz LF, Ahmed F (2015) Benchmarking machine learning techniques for software defect detection. Int J Softw Eng Appl 6(3)
Arar ÖF, Ayan K (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277
Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE'07 (pp. 215–224). IEEE (2007, November)
Bal PR, Mohapatra DP (2017) Software reliability prediction based on radial basis function neural network. In: Advances in computational intelligence. Springer, Singapore, pp 101–110
Bal PR, Jena N, Mohapatra DP (2017) Software reliability prediction based on ensemble models. In: Proceeding of international conference on intelligent communication, control and devices, Springer, Singapore, pp 895–902
Blickle T (1997) Theory of evolutionary algorithms and application to system synthesis (No. 17). vdf Hochschulverlag AG
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In; Proceedings of the fifth annual workshop on computational learning theory (pp 144–152). ACM
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202
Chidamber SR, Kemerer CF (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493
Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electr Eng 67:15–24
Di Martino S, Ferrucci F, Gravino C, Sarro F (2011) A genetic algorithm to configure support vector machines for predicting fault-prone components. In: International conference on product focused software process improvement (pp 247–261). Springer, Berlin
Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput 6(6):1289–1301
Girija SS (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems
Guo L, Cukic B, Singh H (2003) Predicting fault prone modules by the dempster-shafer belief networks. In: Proceedings of the 18th IEEE international conference on automated software engineering, 2003, pp 249–252. IEEE
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910
Jabangwe R, Börstler J, Šmite D, Wohlin C (2015) Empirical evidence on the link between object-oriented measures and external quality attributes: a systematic literature review. Empir Softw Eng 20(3):640–693
Kleinberg EM (2000) On the algorithmic implementation of stochastic discrimination. IEEE Trans Pattern Anal Mach Intell 5:473–490
Kpodjedo S, Ricca F, Galinier P, Guéhéneuc YG, Antoniol G (2011) Design evolution metrics for defect prediction in object oriented systems. Empir Softw Eng 16(1):141–175
Kulamala VK, Teja ASC, Maru A, Singla Y, Mohapatra DP (2018) Predicting software reliability using computational intelligence techniques: a review. In: 2018 international conference on information technology (ICIT), IEEE, pp 114–119
Kumar KV, Kumari P, Chatterjee A, Mohapatra DP (2021) Software fault prediction using random forests. In: Intelligent and cloud computing. Springer, Singapore, pp 95–103
Kumaresh, S., Baskaran, R., Sivaguru, M.: Software Defect Classification using Bayesian Classification Techniques.
Li M, Zhang H, Wu R, Zhou ZH (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230
Maddipati SS, Pradeepini G, Yesubabu A (2018) Software defect prediction using adaptive neuro fuzzy inference system. Int J Appl Eng Res 13(1):394–397
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Mitchell TM (1997) Machine learning. WCB
Murillo-Morera J, Jenkins M (2015) A software defect-proneness prediction framework: a new approach using genetic algorithms to generate learning schemes. In: SEKE, pp 445–450
Purohit A, Chaudhari NS, Tiwari A (2010) Construction of classifier with feature selection based on genetic programming. In: 2010 IEEE congress on evolutionary computation (CEC) (pp 1–5). IEEE, (2010)
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327
Ridella S, Rovetta S, Zunino R (1997) Circular backpropagation networks for classification. IEEE Trans Neural Netw 8(1):84–97
Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30
Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Tech. Rep
Sathyaraj R, Prabu S (2015) An approach for software fault prediction to measure the quality of different prediction methodologies using software metrics. Indian J Sci Technol 8(35)
Sherer SA (1995) Software fault prediction. J Syst Softw 29(2):97–105
Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3
Song Q, Jia Z, Shepperd M, Ying S, Liu J (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370
Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. In: IEEE international conference on neural networks (Vol. 1, No. 24, pp 525–532)
Stephens T (2016) Genetic Programming in Python, with a scikit-learn inspired API: gplearn, 2016–. [Online; accessed 21.6.2017]
Turhan B, Bener A (2009) Analysis of Naive Bayes’ assumptions on software fault data: an empirical study. Data Knowl Eng 68(2):278–290
Twala B (2011) Software faults prediction using multiple classifiers. In: 2011 3rd international conference on computer research and development (ICCRD) (Vol. 4, pp 504–510). IEEE
Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R (2008) Mining software repositories for comprehensible software fault prediction models. J Syst Softw 81(5):823–839
Zadeh LA (1996) Fuzzy logic, neural networks, and soft computing. In: Fuzzy Sets, Fuzzy logic, and fuzzy systems: selected papers by Lotfi A Zadeh (pp 775–782)
Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw 83(4):660–674