Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Xác định loại diễn đạt lại cho phát hiện đạo văn bằng cách sử dụng ngữ cảnh và nhúng từ

International Journal of Educational Technology in Higher Education - Tập 18 - Trang 1-25 - 2021

Faisal Alvi¹, Mark Stevenson², Paul Clough³

¹Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia

²Department of Computer Science, University of Sheffield, Sheffield, United Kingdom

³Information School, University of Sheffield, Sheffield, United Kingdom

Tóm tắt

Các loại diễn đạt lại đã được các nhà nghiên cứu đề xuất như là các cơ chế diễn đạt lại nằm dưới các hành vi đạo văn. Sự thay thế đồng nghĩa, sắp xếp lại từ ngữ và chèn/xóa từ đã được xác định là một số chiến lược diễn đạt lại phổ biến được sử dụng bởi những kẻ đạo văn. Tuy nhiên, các báo cáo tương tự được tạo ra bởi hầu hết các hệ thống phát hiện đạo văn chỉ cung cấp một điểm tương đồng và tạo ra các đoạn văn bản tương thích cùng với các nguồn có thể có của chúng. Trong nghiên cứu này, chúng tôi đề xuất các phương pháp để xác định hai loại diễn đạt lại quan trọng – sự thay thế đồng nghĩa và sắp xếp lại từ ngữ trong các cặp câu được diễn đạt lại và đạo văn. Chúng tôi đề xuất một cách tiếp cận ba giai đoạn sử dụng khớp ngữ cảnh và nhúng từ đã được huấn luyện trước để xác định sự thay thế đồng nghĩa và sắp xếp lại từ ngữ. Cách tiếp cận mà chúng tôi đề xuất cho thấy rằng việc sử dụng Thuật toán Smith Waterman cho Phát hiện Đạo văn và nhúng từ ConceptNet Numberbatch đã được huấn luyện trước cho hiệu suất tốt nhất về điểm số $$\hbox {F}_1$$. Nghiên cứu này có thể được sử dụng để bổ sung các báo cáo tương đồng được tạo ra bởi các hệ thống phát hiện đạo văn hiện có bằng cách kết hợp các phương pháp để xác định các loại diễn đạt lại cho việc phát hiện đạo văn.

Từ khóa

#diễn đạt lại #đạo văn #sự thay thế đồng nghĩa #sắp xếp lại từ ngữ #phát hiện đạo văn #nhúng từ

Tài liệu tham khảo

Alvi, F., El-Alfy, E. S. M,. Al-Khatib, W. G., & Abdel-Aal, R. E. (2012). Analysis and Extraction of Sentence-Level Paraphrase Sub-Corpus in CS Education. In Proceedings of the 2012 ACM Conference of Special Interest Group on IT Education (SIGITE), Association of Computing Machinery, pp 49–54. Alzahrani, S. M., Salim, N., & Abraham, A. (2012). Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 42(2), 133–149. Barrón-Cedeño, A. (2012). On the Mono- and Cross-Language Detection of Text Re-use and Plagiarism. PhD thesis, Universitat Polytecnica De Valencia. Barrón-Cedeño, A., Vila, M., Martí, M. A., & Rosso, P. (2013). Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Computational Linguistics, 39(4), 917–947. Bensalem, I., Rosso, P., & Chikhi, S. (2019). On the use of character n-grams as the only intrinsic evidence of plagiarism. Language Resources and Evaluation, 53(3), 363–396. Bhagat, R. (2009). Learning paraphrases from text. PhD thesis, University of Southern California. Bhagat, R., & Hovy, E. H. (2013). What is a paraphrase? Computational Linguistics, 39(3), 463–472. Bisazza, A., & Federico, M. (2016). A survey of word reordering in statistical machine translation: computational models and language phenomena. Computational Linguistics, 42(2), 163–205. Bretag, T. (2018). Academic integrity. In Oxford Research Encyclopedia of Business and Management, Oxford University Press. Carmona, M. Á. Á., Franco-Salvador, M., Villatoro-Tello, E., Montes-y-Gómez, M., Rosso, P., & Pineda, L. V. (2018). Semantically-informed distance and similarity measures for paraphrase plagiarism identification. Journal of Intelligent and Fuzzy Systems, 34(5), 2983–2990. Chitra, A., & Rajkumar, A. (2016). Plagiarism detection using machine learning-based paraphrase recognizer. Journal of Intelligent Systems, 25(3), 351–359. Chong, M. (2013). A Study on Plagiarism Detection and Plagiarism Direction Identification using Natural Language Processing Techniques. PhD thesis, University of Wolverhampton. Clough, P. (2010). Measuring text reuse in the news industry. In: L. Bently , J. Davis & J. C. Ginsburg (Eds.), (pp. 247–259). Cambridge University Press: Copyright and Piracy. Clough, P., & Stevenson, M. (2011). Developing a corpus of plagiarised short answers. Language Resources and Evaluation, 45(1), 5–24. Denkowski, M., & Lavie, A. (2014). Meteor Universal: language specific translation evaluation for any target language. In Proceedings of the EACL 2014 Workshop on Statistical Machine Translation, pp 376–380. Dias, P. C., & Bastos, A. S. C. (2014). Plagiarism phenomenon in European Countries: results from GENIUS project. Procedia-Social and Behavioral Sciences, 116, 2526–2531. Dolan, B., Quirk, C., & Brockett, C. (2004). Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics. Dolan, W. B., & Brockett, C. (2005). Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), Asia Federation of Natural Language Processing. Fatima, A., Abbas, A., Ming, W., Hosseini, S., & Zhu, D. (2019). Internal and external factors of plagiarism: evidence from Chinese public sector universities. Accountability in Research, 26(1), 1–16. https://doi.org/10.1080/08989621.2018.1552834. Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic plagiarism detection: a systematic literature review. ACM Computing Surveys, 52(6), 1–42. https://doi.org/10.1145/3345317. Foltỳnek, T., Dlabolová, D., Anohina-Naumeca, A., Razı, S., Kravjar, J., Kamzola, L., et al. (2020). Testing of support tools for plagiarism detection. International Journal of Educational Technology in Higher Education, 17(46). Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadiam, S., Rohwer, R., & Wang, Z. (2005). New Experiments in Distributional Representations of Synonymy. In Proceedings of the Ninth Conference on Computational Natural Language Learning, Association for Computational Linguistics, Stroudsburg, PA, USA, CONLL ’05, pp 25–32. Ganitkevich, J., Durme, B. V., & Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of the Human Language Technology Conference (HLT) 2013, North American Chapter of the Association for Computational Linguistics, (pp 758–764). Glinos, D. G. (2014). Discovering Similar Passages within Large Text Documents. In Information Access Evaluation. Multilinguality, Multimodality, and Interaction - 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, UK, pp 98–109. International Center for Academic Integrity (2021) The Fundamental Values of Academic Integrity, 3rd Edition. https://www.academicintegrity.org/the-fundamental-values-of-academic-integrity/, Accessed May 2021. Kanjirangat, V., & Gupta, D. (2016). Study on extrinsic text plagiarism detection techniques and tools. Journal of Engineering Science & Technology Review, 9(5), 9–23. Kanjirangat, V., & Gupta, D. (2018). Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: comparisons, analysis and challenges. Information Processing & Management, 54(3), 408–432. Kauffman, Y., & Young, M. F. (2015). Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-Paste affordance. Computers & Education, 83, 44–56. Kopotev, M., Rostovtsev, A., & Sokolov, M. (2021). Shifting the norm: the case of academic plagiarism detection. The Palgrave Handbook of Digital Russia Studies (pp. 483–500). Cham: Palgrave Macmillan. Kumar, N. (2014). A graph based automatic plagiarism detection technique to handle artificial word reordering and paraphrasing. In International Conference on Intelligent Text Processing and Computational Linguistics, Springer International Publishing, (pp 481–494). Madnani, N., Tetreault, J., & Chodorow, M. (2012). Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, (pp 182–190). Maurer, H. A., Kappe, F., & Zaka, B. (2006). Plagiarism-a survey. Journal of Universal Computer Science, 12(8), 1050–1084. McKeever, L. (2006). Online plagiarism detection services - saviour or scourge? Assessment & Evaluation in Higher Education, 31(2), 155–165. Meuschke, N., & Gipp, B. (2013). State-of-the-art in detecting academic plagiarism. International Journal for Educational Integrity, 9(1), 50–71. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), (pp 52–55). Moritz, M., Hellrich, J., Büchel, S. (2018). A method for human-interpretable paraphrasticality prediction. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, (pp 113–118). Mphahlele, A., & McKenna, S. (2019). The use of turnitin in the higher education sector: decoding the myth. Assessment & Evaluation in Higher Education, 44(7), 1079–1089. Nichols, L., Dewey, K., Emre, M., Chen, S., & Hardekopf, B. (2019). Syntax-based improvements to plagiarism detectors and their evaluations. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education, Association of Computing Machinery. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol 14, pp 1532–1543. Potthast, M., Barrón-Cedeno, A., Stein, B., & Rosso, P. (2011). Cross-language plagiarism detection. Language Resources and Evaluation, 45(1), 45–62. Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., & Stein, B. (2014), Improving the Reproducibility of PAN’s Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling. In Information Access Evaluation. Multilinguality, Multimodality, and Interaction, Springer International Publishing, (pp 268–299) Potthast, M., Goering, S., Rosso, P., & Stein, B. (2015). Towards data submissions for shared tasks: first experiences for the task of text alignment. In Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015. Sanchez-Perez, M. (2018). Plagiarism detection through paraphrase recognition. PhD thesis, Instituto Politécnico Nacional, Mexico. Sanchez-Perez, M., Sidorov, G., & Gelbukh, A. (2014). A winning approach to text alignment for text reuse detection at PAN 2014 – Notebook for PAN at CLEF 2014. Working Notes for CLEF 2014 Conference, Sheffield, UK pp 1004–1011. Sánchez-Vega, F., Villatoro-Tello, E., Montes-y Gómez, M., Rosso, P., Stamatatos, E., & Villaseñor-Pineda, L. (2017). Paraphrase plagiarism identification with character-level features. Pattern Analysis and Applications pp 669–681. Schmidt Hanbidge, A., Tin, T., & Tsang, H. (2020). Academic integrity matters: successful learning with mobile technology. In International Conference on Interactive Collaborative Learning, Springer International Publishing, (pp 966–977). Sousa-Silva, R. (2014). Investigating academic plagiarism: a forensic linguistics approach to plagiarism detection. International Journal for Educational Integrity, 10(1), 31–41. Speer, R., & Lowry-Duda, J. (2017). ConceptNet at SemEval-2017 Task 2: extending word embeddings with multilingual relational knowledge. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics. Speer, R., Chin, J., & Havasi, C. (2017), ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9. (2017). San Francisco (pp. 4444–4451). USA: California. Sultan, M. A., Bethard, S., & Sumner, T. (2014). Back to basics for monolingual alignment: exploiting word similarity and contextual evidence. Transactions of the Association for Computational Linguistics, 2, 219–230. Sun, Y. C., & Yang, F. Y. (2015). Uncovering published authors’ text-borrowing practices: paraphrasing strategies, sources, and self-plagiarism. Journal of English for Academic Purposes. pp. 224–236. Tiedemann, J. (2011). Bitext alignment. Synthesis Lectures on Human Language Technologies, 4(2), 1–165. Vila, M., Martí, M. A., Rodríguez, H., et al. (2014). Is this a paraphrase? what kind? paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4(01), 205–218. Wang, X., Chen, Y.Y., Zhao, H., Lu, B.L. (2013). Labeled alignment for recognizing textual entailment. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP) 2013, Asian Federation of Natural Language Processing, (pp 605–613). Wang, Y., Hou, Y., Che, W., & Liu, T. (2020). From static to dynamic word representations: a survey. International Journal of Machine Learning and Cybernetics pp 1–20. Weber-Wulff, D. (2014). Plagiarism and academic misconduct. False Feathers: A Perspective on Academic Plagiarism (pp. 3–27). Berlin Heidelberg: Springer. Wise, M. J. (1995). Neweyes: a system for comparing biological sequences using the running Karp-Rabin greedy string-tiling algorithm. InProceedings of the Third International Conference on Intelligent Systems for Molecular Biology, Cambridge, United Kingdom, July 16-19, 1995, (pp 393–401). Zhao, S., Wang, H., Liu, T., Li, S. (2008). Pivot approach for extracting paraphrase patterns from bilingual corpora. In Proceedings of the Human Language Technology Conference (HLT) 2008, Association for Computational Linguistics, (pp 780–788).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA