On the influence of program constructs on bug localization effectiveness

Sociedade Brasileira de Computacao - SB - Tập 5 - Trang 1-29 - 2017
Marcelo Garnier1, Isabella Ferreira1, Alessandro Garcia1
1OPUS Research Group, Informatics Department, Rio de Janeiro, Brazil

Tóm tắt

Software projects often reach hundreds or thousands of files. Therefore, manually searching for code elements that should be changed to fix a failure is a difficult task. Static bug localization techniques provide cost-effective means of finding files related to the failure described in a bug report. Structured information retrieval (IR) has been successfully applied by techniques such as BLUiR, BLUiR+, and AmaLgam. However, there are significant shortcomings on how these techniques were evaluated. First, virtually all evaluations have been limited to very few projects written in only one object-oriented programming language, particularly Java. Second, it might be that particular constructs of different programming languages, such as C#, play a role on the effectiveness of bug localization techniques. However, little is known about this phenomenon. Third, the experimental setup for most of the bug localization studies make simplistic assumptions that do not hold on real-world scenarios, thereby raising doubts about the reported effectiveness of existing techniques. In this article, we evaluate BLUiR, BLUiR+, and AmaLgam on 20 C# projects, addressing the aforementioned shortcomings from previous studies. Then, we extend AmaLgam’s algorithm to understand if structured information retrieval can benefit from the use of a wider range of program constructs, including C# constructs inexistent in Java. We also perform an analysis of the influence of program constructs to bug localization effectiveness using Principal Component Analysis (PCA). Our analysis points to Methods and Classes as the constructs that contribute the most to the effectiveness of bug localization. It also reveals a significant contribution from Properties and String literals, constructs not considered in previous studies. Finally, we evaluate the effects of changing the emphasis on particular constructs by making another extension to AmaLgam’s algorithm, enabling the specification of different weights for each construct. Our results show that fine-tuning these weights may increase the effectiveness of bug localization in projects structured with a specific programming language, such as C#.

Tài liệu tham khảo

Bachmann, A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Technical Report IFI-2009.0003b, Department of Informatics (IFI), University of Zurich. http://www.merlin.uzh.ch/publication/show/2525. Baeza-Yates, R, Ribeiro-Neto B (1999) Modern Information Retrieval. ACM Press, New York. Dallmeier, V, Zimmermann T (2016) iBUGS. https://www.st.cs.uni-saarland.de/ibugs/. Accessed Nov 2016. Friendly, M (2002) Corrgrams: Exploratory displays for correlation matrices. Am Stat 56(4): 316–324. Garnier, M (2016) Bug localization in C#. https://mgarnier.github.io/bug_localization. Accessed Nov 2016. Garnier, M, Garcia A (2016) On the evaluation of structured information retrieval-based bug localization on 20 C# projects In: Proceedings of the 30th Brazilian Symposium on Software Engineering. SBES ’16, 123–132.. ACM, New York. doi:10.1145/2973839.2973853. http://doi.acm.org/10.1145/2973839.2973853. Jolliffe, IT (2002) Principal Component Analysis. Springer, Secaucus. Karus, S, Gall H (2011) A study of language usage evolution in open source software In: 8th Working Conference on Mining Software Repositories (MSR). MSR ’11, 13–22.. ACM, New York. doi:10.1145/1985441.1985447. http://doi.acm.org/10.1145/1985441.1985447. Kochhar, PS, Tian Y, Lo D (2014) Potential biases in bug localization: Do they matter? In: 29th International Conference on Automated Software Engineering (ASE), 803–814. doi:10.1145/2642937.2642997. http://doi.acm.org/10.1145/2642937.2642997. Lewis, C, Ou R (2011) Bug Prediction at Google. http://google-engtools.blogspot.sg/2011/12/bug-prediction-at-google.html. Accessed Nov 2016. Lukins, SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9): 972–990. Manning, CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge. Microsoft (2014) NET Compiler Platform (“Roslyn”). https://github.com/dotnet/roslyn. Microsoft Corporation (2012) C# Language Specification 5.0. https://www.microsoft.com/download/details.aspx?id=7029. Accessed July 2016. Rahman, F, Posnett D, Hindle A, Barr E, Devanbu P (2011) Bugcache for inspections: Hit or miss? In: 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 322–331. doi:10.1145/2025113.2025157. http://doi.acm.org/10.1145/2025113.2025157. Rao, S, Kak A (2011) Retrieval from software libraries for bug localization: A comparative study of generic and composite text models In: 8th Working Conference on Mining Software Repositories (MSR), 43–52. doi:10.1145/1985441.1985451. http://doi.acm.org/10.1145/1985441.1985451. Rao, S, Kak A (2013a) moreBugs. https://engineering.purdue.edu/RVL/Database/moreBugs/%23C5. Accessed Nov 2016. Rao, S, Kak A (2013b) moreBugs: A new dataset for benchmarking algorithms for information retrieval from software repositories. Technical Report TR-ECE-13-07. http://docs.lib.purdue.edu/ecetr/447. Saha, RK, Leasey M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval In: 28th International Conference on Automated Software Engineering (ASE), 345–355. doi:10.1109/ASE.2013.6693093. Sisman, B, Kak AC (2012) Incorporating version histories in information retrieval based bug localization In: 9th Working Conference on Mining Software Repositories (MSR), 50–59. The GitHub Blog (2015) Language Trends on GitHub. https://github.com/blog/2047-language-trends-on-github. Accessed Nov 2016. TIOBE Software BV (2016) TIOBE Index for April 2016. http://www.tiobe.com/tiobe_index. Accessed April 2016. Wang, Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques In: 2015 International Symposium on Software Testing and Analysis (ISSTA). ISSTA 2015, 1–11.. ACM, New York. doi:10.1145/2771783.2771797. http://doi.acm.org/10.1145/2771783.2771797. Wang, S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: ACM (ed)22nd International Conference on Program Comprehension (ICPC), 53–63. doi:10.1145/2597008.2597148. http://doi.acm.org/10.1145/2597008.2597148. Zhou, J, Zhang H, Lo D (2012) Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports In: 34th International Conference on Software Engineering (ICSE), 14–24.