Multi-level text document similarity estimation and its application for plagiarism detection
Tóm tắt
Từ khóa
Tài liệu tham khảo
Broder, A.Z., et al.: Indexing shared content in information retrieval systems. In: International Conference on Extending Database Technology. Springer (2006)
Shafiee, F., Shamsfard, M.: Similarity versus relatedness: a novel approach in extractive Persian document summarisation. J. Inf. Sci. 44(3), 314–330 (2018)
Chen, Y.-L., et al.: A similarity-based method for retrieving documents from the SCI/SSCI database. J. Inf. Sci. 32(5), 449–464 (2006)
Zaka, B.: Theory, and applications of similarity detection techniques (2009)
Clough, P.: Old and new challenges in automatic plagiarism detection. In: National Plagiarism Advisory Service, Citeseer (2003). http://ir.shef.ac.uk/cloughie/index.html
Barrón-Cedeño, A., et al.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)
Maurer, H., Zaka, B.: Plagiarism-a problem and how to fight it. In: EdMedia+ Innovate Learning. Association for the Advancement of Computing in Education (AACE) (2007)
Maurer, H.A., Kappe, F., Zaka, B.: Plagiarism—a survey. J. UCS 12(8), 1050–1084 (2006)
Chowdhury, H.A., Bhattacharyya, D.K.: Plagiarism: taxonomy, tools and detection techniques ( 2018). http://arxiv.org/abs/1801.06323
Alvi, F., Stevenson, M., Clough, P.: Paraphrase type identification for plagiarism detection using contexts and word embeddings. Int. J. Educ. Technol. High. Educ. 18(1), 1–25 (2021)
Franco-Salvador, M., et al.: Pan 2015 shared task on plagiarism detection: evaluation of corpora for text alignment. Working Notes Papers of the CLEF (2015)
Mathur, I., Joshi, N.: Plagiarism detection: keeping check on misuse of intellectual property (2012). http://arxiv.org/abs/1210.7678
Gharavi, E., et al.: A deep learning approach to persian plagiarism detection. In: FIRE (Working Notes) (2016)
Momtaz, M., et al.: Graph-based approach to text alignment for plagiarism detection in Persian Documents. in FIRE (working notes) (2016)
Zhou, X., Pappas, N., Smith, N.A.: Multilevel text alignment with cross-document attention (2020). http://arxiv.org/abs/2010.01263
Yousef, T., Janicke, S.: A survey of text alignment visualization. IEEE Trans. Visual Comput. Graphics 27(2), 1149–1159 (2020)
Bengio, Y., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Kusner, M., et al.: From word embeddings to document distances. In: International Conference on Machine Learning (2015)
Wang, S., Zhou, W., Jiang, C.: A survey of word embeddings based on deep learning. Computing 102(3), 717–740 (2020)
Jiang, Z., Gao, S., Chen, L.: Study on text representation method based on deep learning and topic information. Computing 102(3), 623–642 (2020)
Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies (2013)
Khoshnavataher, K., et al.: Developing monolingual Persian corpus for extrinsic plagiarism detection using artificial obfuscation. Notebook for PAN at CLEF (2015)
Asghari, H., et al.: Algorithms and corpora for Persian plagiarism detection. In: Forum for Information Retrieval Evaluation. Springer (2016)
Lopez-Gazpio, I., et al.: Word n-gram attention models for sentence similarity and inference. Expert Syst. Appl. 132, 1–11 (2019)
Stefanovič, P., Kurasova, O., Štrimaitis, R.: The N-grams based text similarity detection approach using self-organizing maps and similarity measures. Appl. Sci. 9(9), 1870 (2019)
Zini, M., et al.: Plagiarism detection through multilevel text comparison. In: 2006 Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS'06). IEEE (2006)
Suchomel, S., Kasprzak, J., Brandejs, M.: Three way search engine queries with multi-feature document comparison for plagiarism detection. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Gali, N., et al.: Framework for syntactic string similarity measures. Expert Syst. Appl. 129, 169–185 (2019)
Nahnsen, T., Uzuner, O., Katz, B.: Lexical chains and sliding locality windows in content-based text similarity detection (2005)
Hoad, T.C., Zobel, J.: Methods for identifying versioned and plagiarized documents. J. Am. Soc. Inform. Sci. Technol. 54(3), 203–215 (2003)
Lazemi, S., Ebrahimpour-Komleh, H.: ParsiPayesh: persian plagiarism detection based on semantic and structural analysis. In: 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE (2020)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Altheneyan, A.S., Menai, M.E.B.: Automatic plagiarism detection in obfuscated text. Pattern Anal. Appl. 23(4), 1627–1650 (2020)
Cai, Y., et al.: A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet. J. Intell. Inf. Syst. 51(1), 23–47 (2018)
de Arruda, H.F., et al.: Paragraph-based representation of texts: a complex networks approach. Inf. Process. Manag. 56(3), 479–494 (2019)
Talebpour, A., Laskoukelayeh, M.S., Aminolroaya, Z.: Plagiarism detection based on a novel trie-based approach. In: Forum for Information Retrieval Evaluation. Springer (2016)
Minaei, B., Niknam, M.: An n-gram based Method for nearly copy detection in plagiarism systems. In: FIRE (working notes) (2016)
Mansoorizadeh, M., Rahgooy, T., Hamedan, I.: Persian plagiarism detection using sentence correlations. In: FIRE (Working Notes) (2016)
Ehsan, N., Shakery, A.: A pairwise document analysis approach for monolingual plagiarism detection. In: FIRE (Working Notes) (2016)
Esteki, F., Esfahani, F.S.: A plagiarism detection approach based on SVM for Persian texts. In: FIRE (Working Notes) (2016)
El Mostafa, H., Benabbou, F.: A deep learning based technique for plagiarism detection: a comparative study. IAES Int. J. Artif. Intell. 9(1), 81 (2020)
Mashhadirajab, F., Shamsfard, M.: A text alignment algorithm based on prediction of obfuscation types using SVM neural network. In: FIRE (working notes) (2016)
Mahdavi, P., Siadati, Z., Yaghmaee, F.: Automatic external Persian plagiarism detection using vector space model. In: 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE (2014)
Mahmoodi, M., Varnamkhasti, M.M.: Design a Persian automated plagiarism detector (AMZPPD) (2014). http://arxiv.org/abs/1403.1618
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Aaai (2006)
Firth, J.R.: Studies in linguistic analysis. Wiley-Blackwell (1957)
Mikolov, T., et al.: Efficient estimation of word representations in vector space (2013). http://arxiv.org/abs/1301.3781
Heuer, H.: Semantic and stylistic text analysis and text summary evaluation (2015)
Kumhar, S.H., et al.: Word embedding generation for urdu language using Word2vec model. Materials Today Proc. (2021)
Jurgens, D.: Learning about word vector representations and deep learning through implementing Word2vec. In: Proceedings of the Fifth Workshop on Teaching NLP (2021)
Hall, P.A., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. (CSUR) 12(4), 381–402 (1980)
Chang, C.-Y., et al.: Using word semantic concepts for plagiarism detection in text documents. Inf Retr J 24(4), 298–321 (2021)
Potthast, M., et al.: An evaluation framework for plagiarism detection. In: Coling 2010: Posters (2010)
Potthast, M., et al.: Overview of the 5th international competition on plagiarism detection. In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT (2013)