Phân loại tự động các báo cáo vấn đề phần mềm bằng các kỹ thuật học máy: một nghiên cứu thực nghiệm

Innovations in Systems and Software Engineering - Tập 13 - Trang 279-297 - 2017
Nitish Pandey1, Debarshi Kumar Sanyal1,2, Abir Hudait1, Amitava Sen3
1School of Computer Engineering, KIIT University, Bhubaneswar, India
2Indian Institute of Technology Kharagpur, Kharagpur, India
3Department of Computer Science and Engineering, JIS University, Kolkata, India

Tóm tắt

Các nhà phát triển phần mềm, kiểm thử viên và khách hàng thường xuyên gửi báo cáo sự cố đến các công cụ theo dõi vấn đề phần mềm để ghi lại các vấn đề họ gặp phải khi sử dụng phần mềm. Các vấn đề này sau đó được chuyển đến các chuyên gia thích hợp để phân tích và sửa chữa. Tuy nhiên, người gửi báo cáo thường phân loại sai yêu cầu cải tiến thành lỗi và ngược lại. Điều này tiêu tốn thời gian quý giá của các nhà phát triển. Do đó, việc phân loại tự động các báo cáo được gửi sẽ có giá trị thực tiễn lớn. Trong bài báo này, chúng tôi phân tích cách các kỹ thuật học máy có thể được sử dụng để thực hiện nhiệm vụ này. Chúng tôi áp dụng các thuật toán phân loại khác nhau, bao gồm naive Bayes, phân tích phân biệt tuyến tính, k-láng giềng gần nhất, máy vector hỗ trợ (SVM) với các nhân khác nhau, cây quyết định và rừng ngẫu nhiên để phân loại các báo cáo từ ba dự án mã nguồn mở. Chúng tôi đánh giá hiệu suất của chúng theo các chỉ số F-measure, độ chính xác trung bình và F-measure trung bình có trọng số. Các thí nghiệm của chúng tôi cho thấy rừng ngẫu nhiên đạt hiệu suất tốt nhất, trong khi SVM với một số nhân nhất định cũng đạt được hiệu suất cao.

Từ khóa

#phân loại tự động #báo cáo sự cố phần mềm #học máy #mã nguồn mở #thuật toán phân loại

Tài liệu tham khảo

Aggarwal K, Timbers F, Rutgers T, Hindle A, Stroulia E, Greiner R (2017) Detecting duplicate bug reports with software engineering domain knowledge. J Softw Evol Process. doi:10.1002/smr.1821 Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08), ACM, pp 23:304–23:318 Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol 20(3):10 Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering (ICSE’06). ACM, pp 361–370 Bhattacharya P, Neamtiu I, Shelton CR (2012) Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J Syst Softw 85(10):2275–2292 Breiman L (2001) Random forests. Mach Learn 45(1):5–32 Cavalcanti YC, Mota Silveira Neto PA, Machado IdC, Vale TF, Almeida ES, Meira SRdL (2014) Challenges and opportunities for software change request repositories: a systematic mapping study. J Softw Evol Process 26(7):620–653 Chawla I, Singh SK (2014) Automatic bug labeling using semantic information from lSI. In: Proceedings of the 2014 7th international conference on contemporary computing (IC3’14). IEEE, pp 376–381 Chawla I, Singh SK (2015) An automated approach for bug categorization using fuzzy logic. In: Proceedings of the 8th India software engineering conference (ISEC’15). ACM, pp 90–99 Chen TH, Thomas SW, Hassan AE (2016) A survey on the use of topic models when mining software repositories. Empir Softw Eng 21(5):1843–1919 Forman G, Scholz M (2010) Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explor Newsl 12(1):49–57 Hemmati H, Nadi S, Baysal O, Kononenko O, Wang W, Holmes R, Godfrey MW (2013) The MSR cookbook: mining a decade of research. In: Proceedings of the 2013 10th IEEE working conference on mining software repositories (MSR’13). IEEE, pp 343–352 Herzig K, Zeller A (2014) Mining bug data. In: Recommendation systems in software engineering. Springer, pp 131–171 Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering (ICSE’13). IEEE, pp 392–401 Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: Proceedings of the the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 111–120 Jonsson L, Borg M, Broman D, Sandahl K, Eldh S, Runeson P (2016) Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir Softw Eng 21(4):1533–1578 Ko A, Myers B, Chau D (2006) A linguistic analysis of how people describe software problems. In: Proceedings of the 2006 IEEE symposium on visual languages and human-centric computing (VL/HCC’06). IEEE, pp 127–134 Kochhar PS, Thung F, Lo D (2014) Automatic fine-grained issue report reclassification. In: Proceedings of the 2014 19th international conference on engineering of complex computer systems (ICECCS’14). IEEE, pp 126–135 Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: Proceedings of the 2010 7th IEEE working conference on mining software repositories (MSR’10). IEEE, pp 1–10 Layman L, Nikora AP, Meek J, Menzies T (2016) Topic modeling of NASA space system problem reports: research in practice. In: Proceedings of the 13th international conference on mining software repositories (MSR’16). ACM, pp 303–314 Limsettho N, Hata H, Matsumoto K (2014) Comparing hierarchical Dirichlet process with latent Dirichlet allocation in bug report multiclass classification. In: Proceedings of the 2014 15th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD’14). IEEE, pp 1–6 Limsettho N, Hata H, Monden A, Matsumoto K (2014) Automatic unsupervised bug report categorization. In: Proceedings of the 2014 6th international workshop on empirical software engineering in practice (IWESEP’14). IEEE, pp 7–12 Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331 Murphy G, C̆ubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering and knowledge engineering (SEKE’04) Nagwani N, Verma S, Mehta KK (2013) Generating taxonomic terms for software bug classification by utilizing topic models based on latent Dirichlet allocation. In: Proceedings of the 2013 11th international conference on ICT and knowledge engineering (ICT&KE’13). IEEE, pp 1–5 Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: manually-classified issue reports. In: Proceedings of the 2015 IEEE/ACM 12th working conference on mining software repositories (MSR’15). IEEE, pp 518–521 Pandey N, Hudait A, Sanyal DK, Sen A (2016) Automated classification of issue reports from a software issue tracker. In: Proceedings of the 4th international conference on advanced computing, networking, and informatics (ICACNI’16). Springer Pingclasai N, Hata H, Matsumoto Ki (2013) Classifying bug reports to bugs and other requests using topic modeling. In: Proceedings of the 2013 20th Asia-Pacific software engineering conference (APSEC’13). IEEE, vol 2, pp 13–18 Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523 Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47 Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference (ISEC’15). ACM, pp 125–130 Strate JD, Laplante PA (2013) A literature review of research in software defect reporting. IEEE Trans Reliab 62(2):444–454 Tamrawi A, Nguyen TT, Al-Kofahi J, Nguyen TN (2011) Fuzzy set-based automatic bug triaging. In: Proceedings of the 2011 33rd international conference on software engineering (ICSE’13). IEEE, pp 884–887 Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of the 2012 19th working conference on reverse engineering (WCRE’12). IEEE, pp 205–214 Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383 Uddin J, Ghazali R, Deris MM, Naseem R, Shah H (2016) A survey on bug prioritization. Artif Intell Rev 47:1–36 Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering (ICSE’08). ACM, pp 461–470 Wu L, Boyi X, Kaiser G, Passonneau R (2011) Bugminer: software reliability analysis via data mining of bug reports. In: Proceedings of the 23rd internal conference on software engineering and knowledge engineering (SEKE’11), pp 95–100 Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 international conference on software engineering (ICSE’13). IEEE, pp 1042–1051 Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176 Zibran MF (2016) On the effectiveness of labeled latent Dirichlet allocation in automatic bug-report categorization. In: Proceedings of the 38th international conference on software engineering (ICSE’16) companion. ACM, pp 713–715 Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643