Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Phân loại tự động các báo cáo vấn đề phần mềm bằng các kỹ thuật học máy: một nghiên cứu thực nghiệm
Tóm tắt
Các nhà phát triển phần mềm, kiểm thử viên và khách hàng thường xuyên gửi báo cáo sự cố đến các công cụ theo dõi vấn đề phần mềm để ghi lại các vấn đề họ gặp phải khi sử dụng phần mềm. Các vấn đề này sau đó được chuyển đến các chuyên gia thích hợp để phân tích và sửa chữa. Tuy nhiên, người gửi báo cáo thường phân loại sai yêu cầu cải tiến thành lỗi và ngược lại. Điều này tiêu tốn thời gian quý giá của các nhà phát triển. Do đó, việc phân loại tự động các báo cáo được gửi sẽ có giá trị thực tiễn lớn. Trong bài báo này, chúng tôi phân tích cách các kỹ thuật học máy có thể được sử dụng để thực hiện nhiệm vụ này. Chúng tôi áp dụng các thuật toán phân loại khác nhau, bao gồm naive Bayes, phân tích phân biệt tuyến tính, k-láng giềng gần nhất, máy vector hỗ trợ (SVM) với các nhân khác nhau, cây quyết định và rừng ngẫu nhiên để phân loại các báo cáo từ ba dự án mã nguồn mở. Chúng tôi đánh giá hiệu suất của chúng theo các chỉ số F-measure, độ chính xác trung bình và F-measure trung bình có trọng số. Các thí nghiệm của chúng tôi cho thấy rừng ngẫu nhiên đạt hiệu suất tốt nhất, trong khi SVM với một số nhân nhất định cũng đạt được hiệu suất cao.
Từ khóa
#phân loại tự động #báo cáo sự cố phần mềm #học máy #mã nguồn mở #thuật toán phân loạiTài liệu tham khảo
Aggarwal K, Timbers F, Rutgers T, Hindle A, Stroulia E, Greiner R (2017) Detecting duplicate bug reports with software engineering domain knowledge. J Softw Evol Process. doi:10.1002/smr.1821
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08), ACM, pp 23:304–23:318
Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol 20(3):10
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering (ICSE’06). ACM, pp 361–370
Bhattacharya P, Neamtiu I, Shelton CR (2012) Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J Syst Softw 85(10):2275–2292
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cavalcanti YC, Mota Silveira Neto PA, Machado IdC, Vale TF, Almeida ES, Meira SRdL (2014) Challenges and opportunities for software change request repositories: a systematic mapping study. J Softw Evol Process 26(7):620–653
Chawla I, Singh SK (2014) Automatic bug labeling using semantic information from lSI. In: Proceedings of the 2014 7th international conference on contemporary computing (IC3’14). IEEE, pp 376–381
Chawla I, Singh SK (2015) An automated approach for bug categorization using fuzzy logic. In: Proceedings of the 8th India software engineering conference (ISEC’15). ACM, pp 90–99
Chen TH, Thomas SW, Hassan AE (2016) A survey on the use of topic models when mining software repositories. Empir Softw Eng 21(5):1843–1919
Forman G, Scholz M (2010) Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explor Newsl 12(1):49–57
Hemmati H, Nadi S, Baysal O, Kononenko O, Wang W, Holmes R, Godfrey MW (2013) The MSR cookbook: mining a decade of research. In: Proceedings of the 2013 10th IEEE working conference on mining software repositories (MSR’13). IEEE, pp 343–352
Herzig K, Zeller A (2014) Mining bug data. In: Recommendation systems in software engineering. Springer, pp 131–171
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering (ICSE’13). IEEE, pp 392–401
Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: Proceedings of the the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 111–120
Jonsson L, Borg M, Broman D, Sandahl K, Eldh S, Runeson P (2016) Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir Softw Eng 21(4):1533–1578
Ko A, Myers B, Chau D (2006) A linguistic analysis of how people describe software problems. In: Proceedings of the 2006 IEEE symposium on visual languages and human-centric computing (VL/HCC’06). IEEE, pp 127–134
Kochhar PS, Thung F, Lo D (2014) Automatic fine-grained issue report reclassification. In: Proceedings of the 2014 19th international conference on engineering of complex computer systems (ICECCS’14). IEEE, pp 126–135
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: Proceedings of the 2010 7th IEEE working conference on mining software repositories (MSR’10). IEEE, pp 1–10
Layman L, Nikora AP, Meek J, Menzies T (2016) Topic modeling of NASA space system problem reports: research in practice. In: Proceedings of the 13th international conference on mining software repositories (MSR’16). ACM, pp 303–314
Limsettho N, Hata H, Matsumoto K (2014) Comparing hierarchical Dirichlet process with latent Dirichlet allocation in bug report multiclass classification. In: Proceedings of the 2014 15th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD’14). IEEE, pp 1–6
Limsettho N, Hata H, Monden A, Matsumoto K (2014) Automatic unsupervised bug report categorization. In: Proceedings of the 2014 6th international workshop on empirical software engineering in practice (IWESEP’14). IEEE, pp 7–12
Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331
Murphy G, C̆ubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering and knowledge engineering (SEKE’04)
Nagwani N, Verma S, Mehta KK (2013) Generating taxonomic terms for software bug classification by utilizing topic models based on latent Dirichlet allocation. In: Proceedings of the 2013 11th international conference on ICT and knowledge engineering (ICT&KE’13). IEEE, pp 1–5
Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: manually-classified issue reports. In: Proceedings of the 2015 IEEE/ACM 12th working conference on mining software repositories (MSR’15). IEEE, pp 518–521
Pandey N, Hudait A, Sanyal DK, Sen A (2016) Automated classification of issue reports from a software issue tracker. In: Proceedings of the 4th international conference on advanced computing, networking, and informatics (ICACNI’16). Springer
Pingclasai N, Hata H, Matsumoto Ki (2013) Classifying bug reports to bugs and other requests using topic modeling. In: Proceedings of the 2013 20th Asia-Pacific software engineering conference (APSEC’13). IEEE, vol 2, pp 13–18
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference (ISEC’15). ACM, pp 125–130
Strate JD, Laplante PA (2013) A literature review of research in software defect reporting. IEEE Trans Reliab 62(2):444–454
Tamrawi A, Nguyen TT, Al-Kofahi J, Nguyen TN (2011) Fuzzy set-based automatic bug triaging. In: Proceedings of the 2011 33rd international conference on software engineering (ICSE’13). IEEE, pp 884–887
Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of the 2012 19th working conference on reverse engineering (WCRE’12). IEEE, pp 205–214
Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383
Uddin J, Ghazali R, Deris MM, Naseem R, Shah H (2016) A survey on bug prioritization. Artif Intell Rev 47:1–36
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering (ICSE’08). ACM, pp 461–470
Wu L, Boyi X, Kaiser G, Passonneau R (2011) Bugminer: software reliability analysis via data mining of bug reports. In: Proceedings of the 23rd internal conference on software engineering and knowledge engineering (SEKE’11), pp 95–100
Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 international conference on software engineering (ICSE’13). IEEE, pp 1042–1051
Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176
Zibran MF (2016) On the effectiveness of labeled latent Dirichlet allocation in automatic bug-report categorization. In: Proceedings of the 38th international conference on software engineering (ICSE’16) companion. ACM, pp 713–715
Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643