Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Về việc sử dụng các kỹ thuật trích xuất đặc trưng văn bản để hỗ trợ việc phát hiện tài liệu refactoring tự động

Innovations in Systems and Software Engineering - Tập 18 - Trang 233-249 - 2021

Licelot Marmolejos¹, Eman Abdullah AlOmar¹, Mohamed Wiem Mkaouer¹, Christian Newman¹, Ali Ouni²

¹Rochester Institute of Technology, Rochester, USA

²ETS Montreal, University of Quebec, Quebec City, Canada

Tóm tắt

Refactoring là nghệ thuật cải thiện cấu trúc nội bộ của một chương trình mà không thay đổi hành vi bên ngoài của nó, và đây là một nhiệm vụ quan trọng trong việc duy trì phần mềm. Trong khi các nghiên cứu hiện có tập trung vào việc phát hiện các thao tác refactoring thông qua khai thác kho phần mềm, thì ít ai tìm hiểu cách các nhà phát triển ghi chép hoạt động refactoring của họ. Do đó, gần đây có một xu hướng cố gắng phát hiện tài liệu refactoring của các nhà phát triển, thông qua việc phân tích thủ công các tài liệu phần mềm nội bộ và bên ngoài của họ. Tuy nhiên, các kỹ thuật này bị hạn chế bởi quy trình thủ công của chúng, điều này cản trở khả năng mở rộng của chúng. Vì vậy, trong nghiên cứu này, chúng tôi giải quyết vấn đề phát hiện tài liệu refactoring như một bài toán phân loại nhị phân. Chúng tôi tập trung vào việc phát hiện tự động các hoạt động refactoring trong các thông điệp commit bằng cách dựa vào khai thác văn bản, tiền xử lý ngôn ngữ tự nhiên và các kỹ thuật học máy có giám sát. Chúng tôi thiết kế công cụ của mình để vượt qua những hạn chế của quy trình thủ công, như đã được các nghiên cứu hiện có đề xuất, thông qua việc khám phá việc biến đổi các thông điệp commit thành các đặc trưng được sử dụng để đào tạo các mô hình khác nhau. Để đánh giá, chúng tôi sử dụng và so sánh năm thuật toán phân loại nhị phân khác nhau, và chúng tôi kiểm tra hiệu quả của các mô hình này bằng cách sử dụng một tập dữ liệu hiện có của các thông điệp được gợi ý thủ công, được biết đến với việc ghi chép các hoạt động refactoring trong mã nguồn. Các thí nghiệm được thực hiện với các kích thước dữ liệu và số lượng bit khác nhau. Theo kết quả của chúng tôi, sự kết hợp của Chi-Squared với máy Bayes và điểm Fisher với máy Bayes có thể là hiệu quả nhất khi tự động xác định các mẫu văn bản refactoring trong các thông điệp commit, với độ chính xác là 0.96 và điểm F là 0.96.

Từ khóa

Tài liệu tham khảo

bekvon/Residence. https://github.com/bekvon/residence/commit/76c364ea47e5a28b2041a0bb3323cb48bab180c9. Accessed 3 Jan 2021 AlOmar EA, Mkaouer MW, Ouni A (2019) Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In: Proceedings of the 3nd international workshop on refactoring-accepted. IEEE AlOmar EA, Mkaouer MW, Ouni A, Kessentini M (2019) On the impact of refactoring on the relationship between quality attributes and design metrics. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–11. IEEE AlOmar EA, Peruma A, Mkaouer MW, Newman C, Ouni A, Kessentini M (2020) How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl 167:114176 AlOmar EA, Rodriguez PT, Bowman J, Wang T, Adepoju B, Lopez K, Newman CD, Ouni A, Mkaouer MW (2020) How do developers refactor code to improve code reusability? In: International conference on software and systems reuse. Springer Alrubaye H, Mkaouer MW, Ouni A (2019) Migrationminer: an automated detection tool of third-party java library migration at the method level. In: 2019 IEEE international conference on software maintenance and evolution (ICSME), pp 414–417. IEEE Alrubaye H, Mkaouer MW, Ouni A (2019) On the use of information retrieval to automate the detection of third-party java library migration at the method level. In: Proceedings of the 27th international conference on program comprehension, pp 347–357. IEEE Press Bavota G, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) An empirical study on the developers’ perception of software coupling. In: Proceedings of the 2013 international conference on software engineering, pp 692–701. IEEE Press Bavota G, Panichella S, Tsantalis N, Di Penta M, Oliveto R, Canfora G (2014)Recommending refactorings based on team co-maintenance patterns. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 337–342. ACM Chávez A, Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes?: a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, pp 74–83. ACM Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Soft Eng 20(6):476–493 Demeyer S, Ducasse S, Nierstrasz O (2000) Finding refactorings via change metrics. In: ACM SIGPLAN notices, vol 35, pp 166–177. ACM Di Z, Li B, Li Z, Liang P (2018) A preliminary investigation of self-admitted refactorings in open source software (S). In: The 30th international conference on software engineering and knowledge engineering, Hotel Pullman, Redwood City, California, USA, July 1–3, [13], pp 165–164. https://doi.org/10.18293/SEKE2018-081 Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming, pp 404–428. Springer Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296 Hattori LP, Lanza M (2008) On the nature of commits. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering, pp III–63. IEEE Press Hayashi S, Tsuda Y, Saeki M (2010) Search-based refactoring detection from source code revisions. IEICE Trans Inf Syst 93(4):754–762 Herbrich R, Graepel T, Campbell C (2001) Bayes point machines. J Mach Learn Res 1:245–279 Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on Mining software repositories, pp 99–108. ACM Howe NR, Rath TM, Manmatha R (2005) Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 377–383. ACM Kehrer T, Kelter U, Taentzer G (2011) A rule-based approach to the semantic lifting of model differences in the context of model versioning. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, pp 163–172. IEEE Computer Society Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, pp 371–372. ACM Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150 Mahouachi R, Kessentini M, Cinnéide MÓ (2013) Search-based refactoring detection using software metrics variation. In: International symposium on search based software engineering, pp 126–140. Springer Mansouri MM (2018) Detection of rename local variable refactoring instances in commit history. PhD thesis, Concordia University Mkaouer MW, Kessentini M, Bechikh S, Deb K, Ó Cinnéide M (2014) Recommendation system for software refactoring using innovization and interactive dynamic optimization. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 331–336. ACM Mkaouer MW, Kessentini M, Cinnéide MÓ, Hayashi S, Deb K (2017) A robust multi-objective approach to balance severity and importance of refactoring opportunities. Emp Softw Eng 22(2):894–927 Mund S (2015) Microsoft azure machine learning. Packt Publishing Ltd Murphy-Hill E, Parnin C, Black AP (2011) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18 Opdyke WF (1992) Refactoring object-oriented frameworks. University of Illinois at Urbana-Champaign, Champaign Pan B, Tian Y, Zhou TS, Wang F, Li JS (2015) Study on image encryption method in clinical data exchange. In: 2015 7th international conference on information technology in medicine and education (ITME), pp 252–255. IEEE Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, pp 35–38. ACM Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Proceedings of the 9th international conference on language resources and evaluation (LREC’14), pp 810–817. European language resources association (ELRA), Reykjavik, Iceland. http://www.lrec-conf.org/proceedings/lrec2014/pdf/292_Paper.pdf Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan S (2009) Hash kernels for structured data. J Mach Learn Res 10:2615–2637 Silva D, Valente MT (2017) Refdiff: detecting refactorings in version histories. In: Proceedings of the 14th international conference on mining software repositories, pp 269–279. IEEE Press Soares G, Gheyi R, Serey D, Massoni T (2010) Making program refactoring safer. IEEE Softw 27(4):52–57 Soetens QD, Perez J, Demeyer S (2013) An initial investigation into change-based reconstruction of floss-refactorings. In: 2013 IEEE international conference on software maintenance, pp 384–387. IEEE Stroggylos K, Spinellis D (2007) Refactoring–does it improve software quality? In: 15th international workshop on software quality (WoSQ’07: ICSE workshops 2007), p 10. IEEE Taneja K, Dig D, Xie T (2007) Automated detection of api refactorings in libraries. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 377–380. ACM Thangthumachit S, Hayashi S, Saeki M (2011) Understanding source code differences by separating refactoring effects. In: 2011 18th Asia-Pacific software engineering conference, pp 339–347. IEEE Tsantalis N, Chaikalis T, Chatzigeorgiou A (2008) Jdeodorant: identification and removal of type-checking bad smells. In: 2008 12th European conference on software maintenance and reengineering, pp 329–331. IEEE Tsantalis N, Mansouri M, Eshkevari L, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp 483–494. IEEE Weinberger K, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. ArXiv preprint arXiv:0902.2206 Weissgerber P, Diehl S (2006) Identifying refactorings from source-code changes. In: 21st IEEE/ACM international conference on automated software engineering (ASE’06), pp 231–240. IEEE Xing Z, Stroulia E (2005) Umldiff: an algorithm for object-oriented design differencing. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, pp 54–65. ACM Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, p 35

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA