Revisiting process versus product metrics: a large scale analysis
Tóm tắt
Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98%/44% and AUCs of 95%/54%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).
Tài liệu tham khảo
Agrawal A, Menzies T (2018) Is better data better than better data miners?: on the benefits of tuning smote for defect prediction. In: IST. ACM
Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Information and Software Technology 98:74–88
Agrawal A, Menzies T (2017) Better data is better than better data miners (benefits of tuning SMOTE for defect prediction). arXiv:1705.03697
Agrawal A, Rahman A, Krishna R, Sobran A, Menzies T (2018) We don’t need another hero? the impact of heroes on software development. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. pp 245–253
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 1–10
Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: ESEM. ACM
Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83(1):2–17
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10):751–761
Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: Using socio-technical networks to predict failures. In: ISSRE
Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality? an empirical case study of windows vista. In: 2009 IEEE 31st international conference on software engineering. IEEE, pp 518–528
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code! examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 4–14
Briand LC, Brasili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11):1028–1044
Cao Y, Ding Z, Xue F, Rong X (2018) An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. International Journal of Bio-Inspired Computation 11(4):282–291
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357
Chen D, Fu W, Krishna R, Menzies T (2018) Applications of psychological science for actionable analytics. FSE’19
Chen D, Stolee KT, Menzies T (2019) Replication can improve prior results: A github study of pull request acceptance. In: Proceedings of the 27th international conference on program comprehension, ICPC ’19. IEEE Press, pp 179–190
Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Computers & Electrical Engineering 67:15–24
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 31–41
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Mono Stat Appl Probab, London
Fenton NE, Neil M (2000) Software metrics: roadmap. In: Proceedings of the conference on the future of software engineering. pp 357–370
Fu W, Menzies T, Shen X (2016) Tuning for software analytics: Is it really necessary? Information and Software Technology 76:135–146
Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software: Practice and Experience 41(5):579–606
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 2015 37th ICSE
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th ICSE, vol 1. IEEE Press, pp 789–800
Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement. IEEE, pp 171–180
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. TSE
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19(2):167–199
Herbsleb J (2014) Socio-technical coordination (keynote). I: Companion Proceedings of the 36th international conference on software engineering, ICSE Companion 2014. Association for Computing Machinery, New York, NY, USA, p 1
Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170
Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 International conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257
Jacob SG, et al. (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int J Comput Appl 117(23)
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, NY, USA, pp 92–101
Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance. pp 1–10
Kamei Y, Matsumoto S, Monden A, Matsumoto K-I, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE International Conference on Software Maintenance. IEEE, pp 1–10
Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K-I (2007) The effects of over and under sampling on fault-prone module detection. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 196–204
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757–773
Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176
Kondo M, German DM, Mizuno O, Choi E-H (2020) The impact of context metrics on just-in-time defect prediction. Empirical Software Engineering 25(1):890–939
Krishna R, Menzies T (2018) Bellwethers: A baseline method for transfer learning. IEEE Trans Softw Eng
Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Software 12(3):161–175
Lumpe M, Vasa R, Menzies T, Rush R, Turhan B (2012) Learning better inspection optimization policies. International Journal of Software Engineering and Knowledge Engineering 22(5):621–644
Madeyski L (2006) Is external code quality correlated with programming experience or feelgood factor? In: International conference on extreme programming and agile processes in software engineering. Springer, pp 65–74
Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal 23(3):393–422
Mathew G, Agrawal A, Menzies T (2017) Trends in topics at se conferences (1993-2013). In: 2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C). IEEE, pp 397–398
Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: 6th PROMISE
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. TSE
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: Current results, limitations, new approaches. ASE
Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(1):2–13
Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 554–563
Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on Predictor models in software engineering. ACM, pp 47–54
Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Transactions on Software Engineering 39(4):537–551
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering, ICSE ’08. Association for Computing Machinery, New York, NY, USA, pp 181–190
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International conference on software engineering. ACM, pp 181–190
Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empirical Software Engineering 22(6):3219–3253
Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 364–373
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461
Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering. IEEE, pp 309–318
Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE TSE
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391
Nayrolles M, Hamou-Lhadj A (2018) Clever: combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: Proceedings of the 15th international conference on mining software repositories. pp 153–164
Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62:1–16
Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: ISSTA ’04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis. ACM, New York, NY, USA, pp 86–96
Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22(2):199–210
Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers? In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209
Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. Journal of Systems and Software 150:22–36
Pascarella L, Palomba F, Bacchelli A (2020) On the performance of method-level bug prediction: A negative result. Journal of Systems and Software 161:110493
Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: A systematic literature review. Information and Software Technology 55(8):1397–1418
Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd international conference on software engineering. pp 491–500
Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 432–441
Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 432–441
Rahman F, Khatri S, Barr ET, Devanbu P (2014a) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering, ICSE 2014. Association for Computing Machinery, New York, NY, USA, pp 424–434
Rahman F, Khatri S, Barr ET, Devanbu P (2014b) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering. ACM, pp 424–434
Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. pp 147–157
Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) Bugcache for inspections: hit or miss? In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 322–331
Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. ESEC/FSE 2015
Rosen C, Grawi B, Shihab E (2015) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969
Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering 21(1):43–71
Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Information Sciences 259:571–595
Seliya N, Khoshgoftaar TM, Van Hulse J (2010) Predicting faults in high assurance software. In: 2010 IEEE 12th international symposium on high assurance systems engineering. IEEE, pp 26–34
Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? EMSE
Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4):341–359
Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Transactions on Software Engineering 29(4):297–310
Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6):1806–1817
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering pp 1–1
Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 812–823
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE 2016. ACM, pp 321–332
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45(7):683–711
Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowledge-Based Systems 81:131–147
Tu H, Nair V (2018) While tuning is good, no tuner is best. In: FSE SWAN
Tu H, Yu Z, Menzies T (2020) Better data labelling with emblem (and how that impacts defect prediction). IEEE Trans Softw Eng
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14(5):540–578
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability 62(2):434–443
Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering 13(5):539–559
Williams C, Spacco J (2008) Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on Defects in large software systems. ACM, pp 32–36
Xia T, Krishna R, Chen J, Mathew G, Shen X, Menzies T (2018) Hyperparameter optimization for effort estimation. arXiv:1805.00336
Xia X, Bao L, Lo D, Li S (2016) Automated debugging considered harmful considered harmful: A user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 267–278
Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering 42(10):977–998
Xia X, Lo D, Wang X, Yang X (2016) Collective personalized change classification with multiobjective search. IEEE Transactions on Reliability 65(4):1810–1829
Yang X, Lo D, Xia X, Sun Jianling (2017) Tlel: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87:206–220
Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE international conference on software quality, reliability and security. IEEE, pp 17–26
Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. pp 689–699
Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empirical Software Engineering 22(6):3186–3218
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 309–320
Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009 IEEE international conference on software maintenance. IEEE, pp 274–283
Zhang H, Zhang X, Gu M (2007) Predicting defective software components from code complexity measures. In: 13th Pacific Rim international symposium on dependable computing (PRDC 2007). IEEE, pp 93–96
Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering 32(10):771–789
Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software 83(4):660–674
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, pp 91–100
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third international workshop on predictor models in software engineering. IEEE Computer Society, p 9