3PcGE: 3-parent child-based genetic evolution for software defect prediction

Innovations in Systems and Software Engineering - Tập 19 - Trang 197-216 - 2022
Somya Goyal1
1Manipal University Jaipur, Jaipur, India

Tóm tắt

Software defect prediction (SDP) is the most fascinating research area in software industry to enhance the quality of software products. SDP classifiers predict the fault-prone modules in early development phases prior to begin testing phase, and thence, the testing efforts can be focused to those predicted fault-prone modules. In this way, the early detection of fault-prone modules increases the chances to release error-free products to the clients with reduced testing efforts and cost. For SDP application, which uses voluminous high-dimensional data, feature selection (FS) has become essential data preprocessing technique. From past three decades, search-based feature selection is prominently deployed to improve the efficiency of predictors. This paper proposes a new approach, namely 3PcGE, for feature selection (FS) based on three-parent child (3Pc) and genetic evolution (GE). The 3PcGE is inspired by evolutionary computation involving three-parent biological evolution process to result an off-spring with best survival capability. The 3Pc separates the spindle from the mother’s cell body having defective mitochondria and replaces the separated spindle in the emptied donor cell body having healthy mitochondria. In this way, 3-parent child is healthier than 2-parent child and free from fatal disease. 3PcGE searches the feature space for an optimal feature subset using the performance of classification and number of features selected as fitness function. The FS is modeled as multi-objective optimization problem, and pareto optimal solution is sought using evolutionary algorithm (3PcGE). The performance is compared with the state-of-the-art FS technique. From experimental results, it is clear that the proposed 3PcGE outperforms the competing filter-based FS techniques by 18.98% and wrapper-based FS techniques by 17.5% in AUC measure. The statistical comparison with the baseline technique (NSGA-II) shows that proposed FS technique 3PcGE is effective to select optimal features and results in better accuracy of SDP models.

Tài liệu tham khảo

Afzal W, Torkar R (2016) Towards benchmarking feature subset selection methods for software fault prediction. In: Pedrycz W, Succi G, Sillitti A (eds) Computational intelligence and quantitative software engineering. Studies in computational intelligence, vol 617. Springer, Cham. https://doi.org/10.1007/978-3-319-25964-2-3 Anbu M, Anandha Mala GS (2019) Feature selection using firefly algorithm in software defect prediction. Cluster Comput 22:10925–10934. https://doi.org/10.1007/s10586-017-1235-3 Barritt JA et al (2001) Cytoplasmic transfer in assisted reproduction. Hum Reprod Update 7:428. https://doi.org/10.1093/humupd/7.4.428 Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2015) Defect prediction as a multiobjective optimization problem. Softw Test Verific Reliab 25(4):426–459 Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38(4):4626–4636 Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci 179(8):1040–1058 Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii. IEEE Trans Evol Comput 6(2):182–197 Erturk E, Sezer EA (2015) A comparison of some soft computing methods for software fault prediction. Expert Syst Appl 42(4):1872–1879. https://doi.org/10.1016/j.eswa.2014.10.025 Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689. https://doi.org/10.1109/32.815326 Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606 Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the international conference on software engineering, pp 789–800 Ghotra B, Mcintosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: Proceedings of the international conference on mining software repositories, pp 146–157 Goyal S, Bhatia PK (2020) Comparison of machine learning techniques for software quality prediction. Int J Knowl Syst Sci (IJKSS) 11(2):21–40 Holmes HG et al (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447 Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. Trans Softw Eng IEEE 38(6):1276–1304 Halstead MH (1977) Elements of software science. Elsevier North Holland, New York Hanley J, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic ROC curve. Radiology 143:29–36 Harman M, Jones B (2001) Search based software engineering. J Inf Softw Technol 43(14):833–839 Harman M, Mansouri SA, Zhang Y (2012) Search-based software engineering: trends, techniques and applications. ACM Comput Surv (CSUR) 45(1):1–61 He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190 Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol J 95:296–312 Hosseini S, Turhan B, Gunarathna D (2019) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147 Jiarpakdee J, Tantithamthavorn C, Hassan AE (2019) The impact of correlated metrics on the interpretation of defect models. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2019.2891758 Khoshgoftaar TM, Allen EB (2000) A practical classification-rule for software quality models. IEEE Trans Reliab 49(2):209–216 Kondo M, Bezemer C-P, Kamei Y, Hassan AE, Mizuno O (2019) The impact of feature reduction techniques on defect prediction models. Empir Softw Eng 24:1925–1963 Li Z, Jing XY, Zhu X (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175 Lin S-W, Ying K-C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35:1817–1824 Liu YC, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864 Mafarja M, Mirjalili S (2017) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453 McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320 Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13 Mitchell T (1997) Machine learning. McGraw-Hill, New York NASA – Software Defect Datasets [Online]. Available: https://nasasoftwaredefectdatasets.wikispaces.com. Accessed 19 Aug 2019 NASA Defect Dataset [Online]. Available: https://github.com/klainfo/NASADefectDataset. Accessed 19 Aug 2019 Ni C, Chen X, Wu F, Shen Y, Gu Q (2019) An empirical study on pareto based multi-objective feature selection for software defect prediction. J Syst Softw 152:215–238. https://doi.org/10.1016/j.jss.2019.03.012 Pressman RS (1997) Software engineering: a practitioner’s approach. McGraw-Hill, New York Porter A, Selby R (1990) Evaluating techniques for generating metric-based classification trees. J Syst Softw 12:209–218 Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418 Aurora R, José RR, Sebastián V (2019) A survey of many-objective optimisation in search-based software engineering. J Syst Softw 149:382–395. https://doi.org/10.1016/j.jss.2018.12.015 Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5 Rodríguez D, Ruiz R, Cuadrado-Gallego J, AguilarRuiz J (2007) Detecting fault modules applying feature selection to classifiers. In: IEEE international conference on information reuse and integration, 2007. IRI 2007., pp 667–672. IEEE Ross SM (2004) Introduction to probability and statistics for engineers and scientists, 3rd edn. Elsevier Press, Cambridge Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215 Song Q, Jia Z, Shepperd M, Ying S, Liu J (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370 Wahono RS (2015) A systematic literature review of software defect prediction. J Softw Eng 1(1):1–16 Wahono RS, Suryana N, Ahmad S (2014) Metaheuristic optimization based feature selection for software defect prediction. J Softw 9(5):1324–1333 Xu Z, Liu J, Yang Z, An G, Jia X (2016) The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th international symposium on software reliability engineering (ISSRE), pp 309–320. IEEE Yu Q, Qian J, Jiang S, Zhenhua Wu, Zhang G (2019) An empirical study on the effectiveness of feature selection for cross-project defect prediction. IEEE Access 7(2019):35710–35718 Zhang J et al (2016) Pregnancy derived from human zygote pronuclear transfer in a patient who had arrested embryos after IVF. Reprod Biomed Online 33:529. https://doi.org/10.1016/j.rbmo.2016.07.008 Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789. https://doi.org/10.1109/TSE.2006.102 Zhang Y, Lo D, Xia X, Sun J (2018) Combined classifier for cross-project defect prediction: an extended empirical study. Front Comp Sci 12(2):280–296. https://doi.org/10.1007/s11704-017-6015-y