On the evaluation of code smells and detection tools

Sociedade Brasileira de Computacao - SB - Tập 5 - Trang 1-28 - 2017
Thanis Paiva1, Amanda Damasceno1, Eduardo Figueiredo1, Cláudio Sant’Anna2
1Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil
2Department of Computer Science, Federal University of Bahia, Salvador, Brazil

Tóm tắt

Code smells refer to any symptom in the source code of a program that possibly indicates a deeper problem, hindering software maintenance and evolution. Detection of code smells is challenging for developers and their informal definition leads to the implementation of multiple detection techniques and tools. This paper evaluates and compares four code smell detection tools, namely inFusion, JDeodorant, PMD, and JSpIRIT. These tools were applied to different versions of the same software systems, namely MobileMedia and Health Watcher, to calculate the accuracy and agreement of code smell detection tools. We calculated the accuracy of each tool in the detection of three code smells: God Class, God Method, and Feature Envy. Agreement was calculated among tools and between pairs of tools. One of our main findings is that the evaluated tools present different levels of accuracy in different contexts. For MobileMedia, for instance, the average recall varies from 0 to 58% and the average precision from 0 to 100%, while for Health Watcher the variations are 0 to 100% and 0 to 85%, respectively. Regarding the agreement, we found that the overall agreement between tools varies from 83 to 98% among all tools and from 67 to 100% between pairs of tools. We also conducted a secondary study of the evolution of code smells in both target systems and found that, in general, code smells are present from the moment of creation of a class or method in 74.4% of the cases of MobileMedia and 87.5% of Health Watcher.

Tài liệu tham khảo

Altman DG (1991) Practical statistics for medical research. Chapman & Hall, London Brown WJ, Malveau RC, Mowbray TJ, Wiley J (1998) AntiPatterns: Refactoring software, architectures, and projects in crisis. Wiley Chatzigeorgiou A, Manakos A (2010) Investigating the evolution of bad smells in object-oriented code. In: Proceedings of the 7th international conference on the quality of information and communications technology. IEEE, pp 106–115 DeMarco T (1979) Structured analysis and system specification. Yourdon, New York Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E (2016) A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering (EASE '16). ACM, article 18 Figueiredo E, Cacho N, Sant'Anna C, Monteiro M, Kulesza U, Garcia A, Soares S, Ferrari F, Khan S, Castor F, Dantas F (2008) Evolving software product lines with aspects: an empirical study on design stability. In: Proceedings of the 30th international conference on software engineering. ACM, pp 261–270 Fontana FA, Braione P, Zanoni M (2012) Automatic detection of bad smells in code: An experimental assessment. J Object Technol 11(2):1–38. doi:10.5381/jot.2012.11.2.a5 Fontana FA, Mäntylä M, Zanoni M, Marino A (2015) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191. doi:10.1007/s10664-015-9378-4 Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley, Boston Gamma E, Vlissides J, Johnson R, Helm R (1994) Design patterns: elements of reusable object-oriented software. Addison-Wesley, Boston Greenwood P, Bartolomei TT, Figueiredo E, Dosea M, Garcia AF, Cacho N, Sant’Anna C, Soares S, Borba P, Kulesza U, Rashid A (2007) On the impact of aspectual decompositions on design stability: An empirical study. In: European conference on object-oriented programming. Springer, pp 176–200 Gwet K (2001) Handbook of inter-rater reliability: how to measure the level of agreement between two or multiple raters. StatAxis Publishing Company, USA Hartmann D (1977) Considerations in the choice of inter-observer reliability estimates. J Appl Behav Anal 10:103–116. doi:10.1901/jaba.1977.10-103 House AE, House BJ, Campbell MB (1981) Measures of interobserver agreement: Calculation formulas and distribution effects. J Behav Assess 3(1):37–57 Kulesza U, Sant’Anna C, Garcia A, Coelho R, Staa A, Lucena C (2006) Quantifying the effects of AOP: a maitenance study. In: Proceedings of the 22nd international conference on software maintenance. ACM, pp 223–233 Langelier G, Sahraoui HA, Poulin P (2005) Visualization-based analysis of quality for large-scale software systems. In: Proceedings of the 20th international conference on automated software engineering. ACM, pp 214–223 Lanza M, Marinescu R (2006) Object-oriented metrics in practice. Springer, Heidelberg Macia I, Garcia J, Popescu D, Garcia A, Medvidovic N, von Staa A (2012) Are automatically-detected code anomalies relevant to architectural modularity?: an exploratory analysis of evolving systems. In: Proceedings of the 11th annual international conference on aspect-oriented software development. ACM, pp 167–178 Mäntylä MV (2005) An experiment on subjective evolvability evaluation of object-oriented software: explaining factors and inter-rater agreement. In: Proceedings of the 2005 international symposium on empirical software engineering. IEEE, pp 287–296 Marinescu C, Marinescu R, Mihancea PF, Ratiu D, Wettel R (2005) iPlasma: an integrated platform for quality assessment of object-oriented design. In: Proceedings of the 21st IEEE international conference on software maintenance. IEEE, pp 25–30 McCray G (2013) Assessing inter-rater agreement for nominal judgment variables. Paper presented at the Language Testing Forum, University of Nottingham, November 15-17 2013 Moha N, Gueheneuc Y, Duchien L, Le Meur A (2010) DECOR: a method for the specification and detection of code and design smells. Softw Eng IEEE Trans 36:20–36. doi: 10.1109/tse.2009.50 Murphy-Hill E, Black A (2010) An interactive ambient visualization for code smells. In: Proceedings of the 5th international symposium on software visualization. ACM, pp 5–14 Oizumi W, Garcia A, Sousa LS, Cafeo B, Zhao Y (2016) Code anomalies flock together: exploring code anomaly agglomerations for locating design problems. In: Proceedings of the 38th international conference on software engineering. ACM, pp 440–445 Paiva T, Damasceno A, Padilha J, Figueiredo E, Sant’Anna C (2015) Experimental evaluation of code smell detection tools. In: 3rd workshop on software Visualization, Evolution, and Maintenance (VEM), pp 17–24 Riel AJ (1996) Object-oriented design heuristics. Addison-Wesley, Boston Soares S, Borba P, Laureano E (2006) Distribution and persistence as aspects. Softw Pract Exp 36(7):711–759. doi:10.1002/spe.715 Travassos G, Shull F, Fredericks M, Basili VR (1999) Detecting defects in object-oriented designs: using reading techniques to increase software quality. In: Proceedings of the 14th conference on object-oriented programming, systems, languages, and applications. ACM, pp 47–56 Tsantalis N, Chaikalis T, Chatzigeorgiou A (2008) JDeodorant: identification and removal of type-checking bad smells. In: Proceedings of the 12th European conference on software maintenance and reengineering. IEEE, pp 329–331 Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2015) When and why your code starts to smell bad. In: Proceedings of the 37th international conference on software engineering. IEEE press, pp 403–414 Vale G, Albuquerque D, Figueiredo E, Garcia A (2015) Defining metric thresholds for software product lines: a comparative study. In: Proceedings of the 19th international conference on software product line. ACM, pp 176–185 Vidal S, Vázquez H, Díaz-Pace A, Marcos C, Garcia A, Oizumi W (2015) JSpIRIT: a flexible tool for the analysis of code smells. In: Proceedings of the 34th international conference of the chilean computer science society. IEEE, pp 35–40 Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer, Heidelberg Yamashita A, Counsell S (2013) Code smells as system-level indicators of maintainability: An empirical study. J Sys Softw 86(10):2639–2653. doi: 10.1016/j.jss.2013.05.007 Zazworka N, Ackermann C (2010) CodeVizard: a tool to aid the analysis of software evolution. In: Proceedings of the 4th international symposium on empirical software engineering and measurement. ACM, article 63