Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

HybridRec: Hệ thống gợi ý cho việc gán thẻ cho các kho lưu trữ GitHub

Springer Science and Business Media LLC - Tập 53 - Trang 9708-9730 - 2022

Juri Di Rocco¹, Davide Di Ruscio¹, Claudio Di Sipio¹, Phuong T. Nguyen¹, Riccardo Rubei¹

¹Department of Information Engineering, Computer Science and Mathematics, Università degli studi dell’Aquila, L’Aquilla, Italy

Tóm tắt

Các kho phần mềm ngày càng trở nên thiết yếu để hỗ trợ quản lý các hiện vật điển hình trong các dự án, bao gồm mã nguồn, tài liệu và báo cáo lỗi. GitHub đứng đầu trong số các nền tảng này, cung cấp cho các nhà phát triển một kho mã với hơn 28 triệu kho lưu trữ khác nhau. Để giúp các nhà phát triển tìm kiếm các hiện vật phù hợp, GitHub sử dụng các chủ đề, là những đoạn văn bản ngắn được gán cho các hiện vật đã lưu trữ. Tuy nhiên, việc gán chủ đề không phù hợp cho một kho lưu trữ có thể cản trở sự phổ biến và khả năng tiếp cận của nó. Trong công trình trước đây của chúng tôi, chúng tôi đã triển khai MNBN và TopFilter để gợi ý các chủ đề GitHub. MNBN khai thác một mạng ngẫu nhiên để dự đoán các chủ đề, trong khi TopFilter dựa vào một hàm dựa trên cú pháp để gợi ý các chủ đề. Trong bài báo này, chúng tôi mở rộng công trình của mình bằng cách xây dựng HybridRec, một hệ thống gợi ý dựa trên các kỹ thuật lọc ngẫu nhiên và lọc cộng tác để tạo ra các chủ đề phù hợp hơn. Để xử lý các tập dữ liệu không cân bằng, chúng tôi sử dụng Mạng Bayes Bổ sung (CNBN). Hơn nữa, chúng tôi áp dụng một giai đoạn tiền xử lý để làm sạch và tinh chỉnh dữ liệu đầu vào trước khi đưa vào động cơ gợi ý. Một đánh giá thực nghiệm cho thấy HybridRec vượt trội hơn ba phương pháp cơ bản hiện đại, đạt được hiệu suất cao hơn theo nhiều chỉ số khác nhau. Chúng tôi kết luận rằng khuôn khổ được thiết kế có thể được sử dụng để giúp các nhà phát triển tăng cường khả năng nhìn thấy của các dự án của họ.

Từ khóa

Tài liệu tham khảo

Al-Shamri MYH Similarity modifiers for enhancing the recommender system performance. Applied Intelligence. https://doi.org/10.1007/s10489-021-02900-7https://doi.org/10.1007/s10489-021-02900-7 Altarawy D, Shahin H, Mohammed A, Meng N (2018) Lascad: Language-agnostic software categorization and similar application detection. J Syst Softw, 142. https://doi.org/10.1016/j.jss.2018.04.018https://doi.org/10.1016/j.jss.2018.04.018 Borges H, Hora AC, Valente MT (2016) Understanding the factors that impact the popularity of GitHub repositories. In: 2016 IEEE International conference on software maintenance and evolution, ICSME 2016, Raleigh, NC, USA, October 2-7, 2016, pp 334–344. IEEE Computer Society. https://doi.org/10.1109/ICSME.2016.31 Cai X, Zhu J, Shen B, Chen Y (2016) Greta: graph-based tag assignment for github repositories. In: 2016 IEEE 40th Annual computer software and applications conference (compsac), vol 1, pp 63–72. https://doi.org/10.1109/COMPSAC.2016.124 Cosentino V, Luis J, Cabot J (2016) Findings from github: methods, datasets and limitations. In: Proceedings of the 13th international conference on mining software repositories, MSR ’16. https://doi.org/10.1145/2901739.2901776. Association for Computing Machinery, New York, pp 137–141 Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. https://doi.org/10.1145/1143844.1143874. ACM, New York, pp 233–240 Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen P, Rubei R (2020) Topfilter: an approach to recommend relevant github topics. In: Proceedings of the 14th ACM / IEEE international symposium on empirical software engineering and measurement (ESEM), ESEM ’20. Association for Computing Machinery, New York. https://doi.org/10.1145/3382494.3410690 Di Sipio C, Rubei R, Di Ruscio D, Nguyen PT (2020) A multinomial naïve bayesian (mnb) network to automatically recommend topics for github repositories. In: Proceedings of the evaluation and assessment in software engineering, EASE ’20. https://doi.org/10.1145/3383219.3383227. Association for Computing Machinery, New York, pp 71–80 Fan H, Zhong Y, Zeng G, Ge C Improving recommender system via knowledge graph based exploring user preference. Applied Intelligence. https://doi.org/10.1007/s10489-021-02872-8 Ganesan K Topic suggestions for millions of repositories - the GitHub Blog (2017). https://github.blog/2017-07-31-topics/ Gousios G, Spinellis D (2012) Ghtorrent: Github’s data from a firehose. In: 2012 9th IEEE Working conference on mining software repositories (MSR), pp 12–21. IEEE Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki. https://www.aclweb.org/anthology/L18-1550 Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empir Softw Eng 26(5):93. https://doi.org/10.1007/s10664-021-09976-2https://doi.org/10.1007/s10664-021-09976-2 Jiang J, Lo D, He J, Xia X, Kochhar PS, Zhang L (2017) Why and how developers fork what from whom in GitHub? Empir Softw Eng 22(1):547–578. https://doi.org/10.1007/s10664-016-9436-6 Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories - MSR 2014. https://doi.org/10.1145/2597073.2597074. ACM Press, Hyderabad, India, pp 92–101 Kibriya AM, Frank E, Pfahringer B, Holmes G (2005) Multinomial naive bayes for text categorization revisited. In: Webb GI, Yu X (eds) AI 2004: advances in artificial intelligence. Springer, Berlin, pp 488–499 Kohavi R, et al. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 14, Montreal, pp 1137–1145 Li X, Wang H, Yin G, Wang T, Yang C, Yu Y, Tang D (2012) Inducing taxonomy from tags: an agglomerative hierarchical clustering framework. In: Zhou S, Zhang S, Karypis G (eds) Advanced data mining and applications. Springer, Berlin, pp 64–77 Linares-Vásquez M, Mcmillan C, Poshyvanyk D, Grechanik M (2014) On using machine learning to automatically classify software applications into domain categories. Empir Softw Engg 19(3):582–618. https://doi.org/10.1007/s10664-012-9230-z Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems - volume 2, NIPS’13. Curran Associates Inc., Red Hook, pp 3111–3119 Nguyen PT, Di Rocco J, Di Ruscio D, Di Penta M (2020) CrossRec: supporting software developers by recommending third-party libraries. J Syst Softw 161:110,460. https://doi.org/10.1016/j.jss.2019.110460https://doi.org/10.1016/j.jss.2019.110460, http://www.sciencedirect.com/science/article/pii/S0164121219302341 Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). https://doi.org/10.3115/v1/D14-1162, https://www.aclweb.org/anthology/D14-1162. Association for Computational Linguistics, Doha, pp 1532–1543 Rennie JDM, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on international conference on machine learning, ICML’03, pp 616–623. AAAI Press Robillard M, Walker R, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Softw 27(4):80–86. https://doi.org/10.1109/MS.2009.161 Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen PT, Rubei R (2021) Development of recommendation systems for software engineering: the CROSSMINER experience. Empir Softw Eng 26(4):69 Sas C, Capiluppi A. (2021) Labelgit: a dataset for software repositories classification using attributed dependency graphs Schafer JB, Frankowski D, Herlocker J, Sen S (2007) The adaptive web. chap. Collaborative filtering recommender systems. Springer, Berlin, pp 291–324. http://dl.acm.org/citation.cfm?id=1768197.1768208 Soll M, Vosgerau M (2017) Classifyhub: an algorithm to classify github repositories, pp 373–379. https://doi.org/10.1007/978-3-319-67190-1_34https://doi.org/10.1007/978-3-319-67190-1_34 Taraghi B, Grossegger M, Ebner M, Holzinger A (2013) . Web analytics of user path tracing and a novel algorithm for generating recommendations in open journal systems 37(5):672–691. https://doi.org/10.1108/OIR-09-2012-0152, Publisher: Emerald Group Publishing Limited Tran TNT, Felfernig A, Trattner C, Holzinger A (2020) . Recommender systems in the healthcare domain: state-of-the-art and research issues 57(1):171–201. https://doi.org/10.1007/s10844-020-00633-6https://doi.org/10.1007/s10844-020-00633-6 Vargas-Baldrich S, Linares-Vásquez M, Poshyvanyk D (2015) Automated tagging of software projects using bytecode and dependencies. In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 289–294. https://doi.org/10.1109/ASE.2015.38 Velázquez-Rodríguez C, Roover CD (2020) MUTAMA: an automated multi-label tagging approach for software libraries on maven. In: 2020 IEEE 20th international working conference on source code analysis and manipulation (SCAM), pp 254–258. https://doi.org/10.1109/SCAM51674.2020.00034, ISSN: 2470-6892 Wang T, Wang H, Yin G, Ling CX, Li X, Zou P (2014) Tag recommendation for open source software. Front Comput Sci 8(1):69–82. https://doi.org/10.1007/s11704-013-2394-x Zhang Y, Xu F, Li S, Meng Y, Wang X, Li Q, Han J (2019) Higitclass: keyword-driven hierarchical classification of github repositories Zhao ZD, Shang Ms (2010) User-based collaborative-filtering recommendation algorithms on hadoop. In: Proceedings of the 2010 third international conference on knowledge discovery and data mining, WKDD ’10. https://doi.org/10.1109/WKDD.2010.54. IEEE Computer Society, Washington, DC, pp 478–481 Zhou Y, Wu J, Sun Y (2021) Ghtrec: a personalized service to recommend github trending repositories for developers. In: 2021 IEEE International conference on web services (ICWS), pp 314–323. https://doi.org/10.1109/ICWS53863.2021.00049

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA