Cải thiện hiệu suất phân loại báo cáo lỗi bằng cách sử dụng mô hình sinh tài liệu dựa trên trí tuệ nhân tạo
Tóm tắt
Trí tuệ nhân tạo là một trong những công nghệ chính để phát triển cuộc cách mạng công nghiệp lần thứ tư. Công nghệ này cũng có tác động đáng kể đến các chuyên gia phần mềm, những người luôn nỗ lực đạt được phát triển phần mềm chất lượng cao bằng cách sửa các loại lỗi phần mềm khác nhau. Trong quá trình phát triển và bảo trì phần mềm, lỗi phần mềm là yếu tố chính có thể ảnh hưởng đến chi phí và thời gian giao hàng phần mềm. Để sửa lỗi phần mềm một cách hiệu quả, các kho lưu trữ lỗi mở được sử dụng để nhận diện các báo cáo lỗi và phân loại, ưu tiên các báo cáo được giao cho các nhà phát triển phần mềm phù hợp nhất dựa trên mức độ quan tâm và chuyên môn của họ. Do thiếu tài nguyên như thời gian và nhân lực, quá trình phân loại báo cáo lỗi này cực kỳ quan trọng trong phát triển phần mềm. Để cải thiện hiệu suất phân loại báo cáo lỗi, nhiều nghiên cứu đã tập trung vào việc sử dụng phân phối Dirichlet tiềm ẩn (LDA) kết hợp với k láng giềng gần nhất hoặc máy vector hỗ trợ. Mặc dù các phương pháp hiện có đã cải thiện độ chính xác của phân loại lỗi, nhưng chúng thường gây ra xung đột giữa các kỹ thuật kết hợp và tạo ra kết quả phân loại sai. Trong nghiên cứu này, chúng tôi đề xuất một phương pháp để cải thiện hiệu suất phân loại báo cáo lỗi bằng cách sử dụng nhiều tập chủ đề dựa trên LDA thông qua việc cải thiện LDA. Phương pháp đề xuất cải thiện các tập chủ đề hiện có của LDA bằng cách xây dựng hai tập chủ đề phụ. Trong thí nghiệm của chúng tôi, chúng tôi đã thu thập báo cáo lỗi từ một hệ thống theo dõi lỗi phổ biến, Bugzilla, cũng như báo cáo lỗi Android, để đánh giá phương pháp đề xuất và chứng minh việc đạt được hai mục tiêu sau: tăng độ chính xác của phân loại báo cáo lỗi và đảm bảo tính tương thích với các phương pháp hiện đại khác.
Từ khóa
Tài liệu tham khảo
Tunio MZ, Luo H, Wang C, Zhao F (2018) Crowdsourcing software development: task assignment using PDDL artificial intelligence planning. J Inf Processing Syst 14(1):129–139
Park JH, Salim MM, Jo JH, Sicato JCS, Rathore S, Park JH (2019) CIoT-Net: a scalable cognitive IoT based smart city network architecture. Hum Centric Comput Inf Sci 9(1):29
Jang Y, Park CH, Seo YS (2019) Fake news analysis modeling using quote retweet. Electronics 8(12):1–20
Kim SW, Gil JM (2019) Research paper classification systems based on TF-IDF and LDA schemes. Hum Centric Comput Inf Sci 9(1):30
Tian Y, Song W, Sun S, Fong S, Zou S (2019) 3D object recognition method with multiple feature extraction from LiDAR point clouds. J Supercomput 75(8):4430–4442
Song W, Tian Y, Fong S, Cho K, Wang W, Zhang W (2016) GPU-accelerated foreground segmentation and labeling for real-time video surveillance. Sustainability 8(10):916
Huh JH, Seo YS (2019) Understanding edge computing: engineering evolution with artificial intelligence. IEEE Access 7:164229–164245
Seo YS, Huh JH (2019) Automatic emotion-based music classification for supporting intelligent IoT applications. Electronics 8(2):1–20
Wang J, Ju C, Gao Y, Sangaiah AK, Kim GJ (2018) A PSO based energy efficient coverage control algorithm for wireless sensor networks. Comput Mater Contin 56(3):433–446
Wang J, Gao Y, Liu W, Sangaiah AK, Kim HJ (2019) An intelligent data gathering schema with data fusion supported for mobile sink in wireless sensor networks. Int J Distrib Sens Netw 15(3):1–9
Wang J, Gao Y, Yin X, Li F, Kim HJ (2018) An enhanced PEGASIS algorithm with mobile sink support for wireless sensor networks. Wirel Commun Mobile Comput 2018:1–9
Wang J, Wu W, Liao Z, Sangaiah AK, Sherratt RS (2019) An energy-efficient offloading scheme for low latency in collaborative edge computing. IEEE Access 7:149182–149190
Jimoh RG, Balogun AO, Bajeh AO, Ajayi S (2018) A PROMETHEE based evaluation of software defect predictors. J Comput Sci Appl 25(1):106–119
Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402
Tran HM, Le ST, Nguyen SV, Ho PT (2020) An analysis of software bug reports using machine learning techniques. SN Comput Sci 1(1):4
García-Floriano A, López-Martín C, Yáñez-Márquez C, Abran A (2018) Support vector regression for predicting software enhancement effort. Inf Softw Technol 97:99–109
Alaqail H, Ahmed S (2018) Overview of software testing standard ISO/IEC/IEEE 29119. Inf Softw Technol 18(2):112–116
Mann M, Tomar P, Sangwan OP (2018) Bio-inspired metaheuristics: evolving and prioritizing software test data. Appl Intell 48(3):687–702
Zhang H (2019) Research on software development and test environment automation based on android platform. 3rd International Conference on mechatronics engineering and information technology. Atlantis Press, Paris
Thakur D, Types of software maintenance. http://ecomputernotes.com/software-engineering/types-of-software-maintenance. Accessed 22 Sep 2019
Stojanov Z, Stojanov J, Dobrilovic D, Petrov N (2017) Trends in software maintenance tasks distribution among programmers: A study in a micro software company. 2017 IEEE 15th International Symposium on intelligent systems and informatics, pp 23–28
Jang JW (2018) Improvement of the automobile control software testing process using a test maturity model. J Inf Process Syst 14(3):607–620
Life Cycle of a bug. https://www.bugzilla.org/docs/2.18/html/lifecycle.html
Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange—eclipse ‘05. pp 35–39
Ye X, Fang F, Wu J, Bunescu R, Liu C (2018) Bug Report Classification using LSTM architecture for more accurate software defect locating. 17th IEEE International Conference on machine learning and applications. pp 1438–1445
Terdchanakul P, Hata H, Phannachitta P, Matsumoto K (2017) Bug or not? bug report classification using N-gram IDF. IEEE International Conference on software maintenance and evolution. pp 534–538
Guo S, Chen R, Wei M, Li H, Liu Y (2018) Ensemble data reduction techniques and multi-RSMOTE via fuzzy integral for bug report classification. IEEE Access 6:45934–45950
Kukkar A, Mohana R (2018) A supervised bug report classification with incorporate and textual field knowledge. Procedia Comput Sci 132:352–361
Du X, Zheng Z, Xiao G, Yin B (2017) The automatic classification of fault trigger based bug report. IEEE International Symposium on software reliability engineering workshops. pp 259–265
Xu R, Ye L, Xu J (2013) Reader’s emotion prediction based on weighted Latent Dirichlet Allocation and multi-label k-nearest neighbor model. J Comput Inf Syst 9(6):2209–2216
Safi’ie MA, Utami E, Fatta HA (2018) Latent Dirichlet Allocation (LDA) model and knn algorithm to classify research project selection. IOP Conference Series Mater Sci Engin 333(1):012110
Chen W, Zhang X (2017) Research on text categorization model based on LDA—KNN. 2017 IEEE 2nd advanced information technology, electronic and automation Control Conference. pp 2719–2726
Liu X, Agarwal S, Ding C, Yu Q (2016) An LDA-SVM active learning framework for web service classification. 2016 IEEE International Conference on web services. pp 49–56
Wang X, Wang J, Yang Y, Duan J (2017) Labeled LDA-Kernel SVM: A short Chinese text supervised classification based on sina weibo. 2017 4th International Conference on information science and control engineering. pp 428–432
Deliu I, Leichter C, Franke K (2018) Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and Latent Dirichlet Allocation. 2018 IEEE International Conference on Big Data. pp 5008–5013
Lee DG, Seo YS (2019) Systematic review of bug report processing techniques to improve software management performance. J Inf Processing Syst. 15(4):967–985
Bugzilla. https://bugzilla.mozilla.org/home. Accessed 22 Sep 2019
Mining challenge. http://2012.msrconf.org/challenge.php#challenge_data. Accessed 22 Sep 2019
Martie L, Palepu VK, Sajnani H, Lopes C (2012) Trendy bugs: topic trends in the android bug reports. In Proc. MSR. pp 120–123
Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. Proceeding MSR ‘13 Proceedings of the 10th Working Conference on mining software repositories. pp 183–192
Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21(2):368–410
Guana V, Rocha F, Hindle A, Stroulia E (2012) Do the stars align? multidimensional analysis of android’s layered architecture. Mining Software Repositories (MSR) 2012 9th IEEE Working Conference on. IEEE, New York, pp 124–127
Hindle A, Ernst NA, Godfrey MW. Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. Proceedings of the 8th Working Conference on mining software repositories. ACM. pp 163–172
Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendorspecific bugs. 19th Working Conference on reverse engineering. pp 83–92
Sun C, Lo D, Khoo S, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. Proceedings of the 2011 26th IEEE/ACM International Conference on automated software engineering. IEEE Computer Society. pp 253–262
Budhiraja A, Dutta K, Shrivastava M, Reddy R (2018) Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories. Proceedings of the 2018 ACM SIGIR International Conference on theory of information retrieval. pp 167–170
Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: SANER 2015: International Conference on software analysis, evolution and reengineering. pp 211–220
Aggarwal K, Timbers F, Rutgers T, Hindle A, Stroulia E, Greiner R (2017) Detecting duplicate bug reports with software engineering domain knowledge. J Softw Evol Process 29(3):e1821
Campbell JC, Santos EA, Hindle A (2016) The unreasonable effectiveness of traditional information retrieval in crash report deduplication. 2016 IEEE/ACM 13th Working Conference on mining software repositories (MSR). pp 269–280
Hindle A, Onuczko C (2019) Preventing duplicate bug reports by continuously querying bug reports. Empir Softw Eng. 24(2):902–936
Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. Proc. ASE’12. pp 70–79
Chang J, Blei DM (2009) Relational topic models for document networks, In AIStats. pp 81–88
Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems, in dependable systems and networks with FTCS and DCC 2008. DSN 2008. IEEE International Conference on. IEEE, New York. pp 52–61
Ebrahimi N, Trabelsi A, Islam MS, Hamou-Lhadj A, Khanmohammadi K (2019) An HMM-based approach for automatic detection and classification of duplicate bug reports. Inf Softw Technol 113:98–109
Budhiraja A, Dutta K, Reddy R, Shrivastava M (2018) DWEN: deep word embedding network for duplicate bug report detection in software repositories. Proceedings of the 40th International Conference on software engineering: companion proceeedings. pp 193–194
Tamrawi A, Nguyen TT, Al-Kofahi JM, Nguyen TN (2011) Fuzzy-set and cache-based approach for bug triaging. Proc. 19th ACM SIGSOFT Symp. Foundations of software engineering (FSE’11). pp 365–375
Wang S, Zhang W, Wang Q (2014) Fixercache: unsupervised caching active developers for diverse bug triage. In ACM/IEEE International Symposium on empirical software engineering and measurement 25
Wen W, Yu T, Hayes JH (2016) Colua: Automatically predicting configuration bug reports and extracting configuration options. in 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). pp 150–161
Zhang W, Wang S, Wang Q (2016) KSAP: an approach to bug report assignment using KNN search and heterogeneous proximity. Inf Softw Technol 70:68–84
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Xia X, Lo D, Wang X, Zhou B (2013) Accurate developer recommendation for bug resolution. In WCRE’13. pp 72–81
Wu W, Zhang W, Yang Y, Wang Q (2011) DREX: Developer recommendation with k-nearest-neighbor search and expertise ranking. in: APSEC, IEEE, New York, pp 389–396
Xie X, Zhang W, Yang Y, Wang Q (2012) DRETOM: developer recommendation based on topic models for bug resolution. In PROMISE’12. pp 19–28
Prabhakar RN, Ranjith KS (2016) Effective bug triage with software data reduction techniques using clustering mechanism. i-Manager’s J Inf Technol 5(3):15–23
Chaudhari RA, Bodake SV (2017) Effective bug triage using software data reduction techniques. Int J Innovative Res Sci Technol 4(1):214–220
Kirubakaran S, Maheswari K (2016) Auto-bug triager for assisting manual bug triage. Asian J Inf Technol 15(8):1334–1339
Govindasamy V, Akila V, Anjanadevi G, Deepika H, Sivasankari G (2016) Data reduction for bug triage using effective prediction of reduction order techniques. 2016 International Conference on Computation of power, energy information and communication. pp 85–90
Sahu K, Lilhore UK, Agarwal N (2018) An improved data reduction technique based on KNN & NB with hybrid selection method for effective software bugs triage. Eng Inf Technol 3(5):1835146
Yin Y, Dong X, Xu T (2018) Rapid and efficient bug assignment using ELM for IOT software. IEEE Access 6:52713–52724
Florea AC, Anvik J, Andonie R (2017) Spark-based cluster implementation of a bug report assignment recommender system. International Conference on artificial intelligence and soft computing. pp 31–42
Florea AC, Anvik J, Andonie R (2017) Parallel implementation of a bug report assignment recommender using deep learning. International Conference on artificial neural networks. pp 64–71
Lee SR, Heo MJ, Lee CG, Kim M, Jeong G (2017) Applying deep learning based automatic bug triager to industrial projects. Proceedings of the 2017 11th Joint Meeting on foundations of software engineering. pp 926–931
Bug report from Bugzilla. https://bugzilla.mozilla.org/show_bug.cgi?id=1511914. Accessed 30 Jan 2020
Bug report from Github. https://github.com/glfw/glfw/pull/1602. Accessed 30 Jan 2020
Git. https://git-scm.com/. Accessed 30 Jan 2020
Zou D, Liang J, Xiong Y, Ernst MD, Zhang L (2019) An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering (Early access)
Cleophas TJ, Zwinderman AH (2018) Bayesian paired T-Test. Modern bayesian statistics in clinical research. pp 49–58