Revisiting process versus product metrics: a large scale analysis

Empirical Software Engineering - Tập 27 - Trang 1-42 - 2022
Suvodeep Majumder1, Pranav Mody1, Tim Menzies1
1Department of Computer Science, North Carolina State University, Raleigh, USA

Tóm tắt

Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98%/44% and AUCs of 95%/54%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

Tài liệu tham khảo

Agrawal A, Menzies T (2018) Is better data better than better data miners?: on the benefits of tuning smote for defect prediction. In: IST. ACM Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Information and Software Technology 98:74–88 Agrawal A, Menzies T (2017) Better data is better than better data miners (benefits of tuning SMOTE for defect prediction). arXiv:1705.03697 Agrawal A, Rahman A, Krishna R, Sobran A, Menzies T (2018) We don’t need another hero? the impact of heroes on software development. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. pp 245–253 Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 1–10 Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: ESEM. ACM Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83(1):2–17 Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10):751–761 Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: Using socio-technical networks to predict failures. In: ISSRE Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality? an empirical case study of windows vista. In: 2009 IEEE 31st international conference on software engineering. IEEE, pp 518–528 Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code! examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 4–14 Briand LC, Brasili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11):1028–1044 Cao Y, Ding Z, Xue F, Rong X (2018) An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. International Journal of Bio-Inspired Computation 11(4):282–291 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357 Chen D, Fu W, Krishna R, Menzies T (2018) Applications of psychological science for actionable analytics. FSE’19 Chen D, Stolee KT, Menzies T (2019) Replication can improve prior results: A github study of pull request acceptance. In: Proceedings of the 27th international conference on program comprehension, ICPC ’19. IEEE Press, pp 179–190 Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Computers & Electrical Engineering 67:15–24 D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 31–41 Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Mono Stat Appl Probab, London Fenton NE, Neil M (2000) Software metrics: roadmap. In: Proceedings of the conference on the future of software engineering. pp 357–370 Fu W, Menzies T, Shen X (2016) Tuning for software analytics: Is it really necessary? Information and Software Technology 76:135–146 Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software: Practice and Experience 41(5):579–606 Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 2015 37th ICSE Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th ICSE, vol 1. IEEE Press, pp 789–800 Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement. IEEE, pp 171–180 Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. TSE He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19(2):167–199 Herbsleb J (2014) Socio-technical coordination (keynote). I: Companion Proceedings of the 36th international conference on software engineering, ICSE Companion 2014. Association for Computing Machinery, New York, NY, USA, p 1 Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170 Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 International conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257 Jacob SG, et al. (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int J Comput Appl 117(23) Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, NY, USA, pp 92–101 Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance. pp 1–10 Kamei Y, Matsumoto S, Monden A, Matsumoto K-I, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE International Conference on Software Maintenance. IEEE, pp 1–10 Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K-I (2007) The effects of over and under sampling on fault-prone module detection. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 196–204 Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757–773 Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176 Kondo M, German DM, Mizuno O, Choi E-H (2020) The impact of context metrics on just-in-time defect prediction. Empirical Software Engineering 25(1):890–939 Krishna R, Menzies T (2018) Bellwethers: A baseline method for transfer learning. IEEE Trans Softw Eng Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Software 12(3):161–175 Lumpe M, Vasa R, Menzies T, Rush R, Turhan B (2012) Learning better inspection optimization policies. International Journal of Software Engineering and Knowledge Engineering 22(5):621–644 Madeyski L (2006) Is external code quality correlated with programming experience or feelgood factor? In: International conference on extreme programming and agile processes in software engineering. Springer, pp 65–74 Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal 23(3):393–422 Mathew G, Agrawal A, Menzies T (2017) Trends in topics at se conferences (1993-2013). In: 2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C). IEEE, pp 397–398 Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: 6th PROMISE Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. TSE Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: Current results, limitations, new approaches. ASE Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(1):2–13 Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 554–563 Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on Predictor models in software engineering. ACM, pp 47–54 Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Transactions on Software Engineering 39(4):537–551 Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering, ICSE ’08. Association for Computing Machinery, New York, NY, USA, pp 181–190 Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International conference on software engineering. ACM, pp 181–190 Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empirical Software Engineering 22(6):3219–3253 Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 364–373 Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461 Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering. IEEE, pp 309–318 Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE TSE Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391 Nayrolles M, Hamou-Lhadj A (2018) Clever: combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: Proceedings of the 15th international conference on mining software repositories. pp 153–164 Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62:1–16 Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: ISSTA ’04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis. ACM, New York, NY, USA, pp 86–96 Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22(2):199–210 Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers? In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209 Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. Journal of Systems and Software 150:22–36 Pascarella L, Palomba F, Bacchelli A (2020) On the performance of method-level bug prediction: A negative result. Journal of Systems and Software 161:110493 Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: A systematic literature review. Information and Software Technology 55(8):1397–1418 Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd international conference on software engineering. pp 491–500 Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 432–441 Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 432–441 Rahman F, Khatri S, Barr ET, Devanbu P (2014a) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering, ICSE 2014. Association for Computing Machinery, New York, NY, USA, pp 424–434 Rahman F, Khatri S, Barr ET, Devanbu P (2014b) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering. ACM, pp 424–434 Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. pp 147–157 Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) Bugcache for inspections: hit or miss? In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 322–331 Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. ESEC/FSE 2015 Rosen C, Grawi B, Shihab E (2015) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969 Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering 21(1):43–71 Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Information Sciences 259:571–595 Seliya N, Khoshgoftaar TM, Van Hulse J (2010) Predicting faults in high assurance software. In: 2010 IEEE 12th international symposium on high assurance systems engineering. IEEE, pp 26–34 Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? EMSE Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4):341–359 Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Transactions on Software Engineering 29(4):297–310 Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6):1806–1817 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering pp 1–1 Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 812–823 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE 2016. ACM, pp 321–332 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45(7):683–711 Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowledge-Based Systems 81:131–147 Tu H, Nair V (2018) While tuning is good, no tuner is best. In: FSE SWAN Tu H, Yu Z, Menzies T (2020) Better data labelling with emblem (and how that impacts defect prediction). IEEE Trans Softw Eng Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14(5):540–578 Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability 62(2):434–443 Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering 13(5):539–559 Williams C, Spacco J (2008) Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on Defects in large software systems. ACM, pp 32–36 Xia T, Krishna R, Chen J, Mathew G, Shen X, Menzies T (2018) Hyperparameter optimization for effort estimation. arXiv:1805.00336 Xia X, Bao L, Lo D, Li S (2016) Automated debugging considered harmful considered harmful: A user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 267–278 Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering 42(10):977–998 Xia X, Lo D, Wang X, Yang X (2016) Collective personalized change classification with multiobjective search. IEEE Transactions on Reliability 65(4):1810–1829 Yang X, Lo D, Xia X, Sun Jianling (2017) Tlel: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87:206–220 Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE international conference on software quality, reliability and security. IEEE, pp 17–26 Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168 Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. pp 689–699 Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empirical Software Engineering 22(6):3186–3218 Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 309–320 Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009 IEEE international conference on software maintenance. IEEE, pp 274–283 Zhang H, Zhang X, Gu M (2007) Predicting defective software components from code complexity measures. In: 13th Pacific Rim international symposium on dependable computing (PRDC 2007). IEEE, pp 93–96 Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering 32(10):771–789 Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software 83(4):660–674 Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, pp 91–100 Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third international workshop on predictor models in software engineering. IEEE Computer Society, p 9