A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model
Journal of King Saud University - Computer and Information Sciences - Tập 34 - Trang 6048-6056 - 2022
Tài liệu tham khảo
Abdelali, A., Darwish, K., Durrani, N., Mubarak, H., 2016. Farasa: A Fast and Furious Segmenter for Arabic, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, San Diego, California, pp. 11–16. https://doi.org/10.18653/v1/n16-3003
Alami, 2020, LISAC FSDM-USMBA Team at SemEval-2020 Task 12: Overcoming AraBERT’s pretrain-finetune discrepancy for Arabic offensive language identification, Proc. Fourteenth Workshop Seman. Eval., 2080, 10.18653/v1/2020.semeval-1.275
Amini, 2010, Combining coregularization and consensus-based self-training for multilingual text categorization, 475
Antoun, W., Baly, F., Hajj, H., 2020. AraBERT: Transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104.
Bel, N., Koster, C. H., & Villegas, M., 2003. Cross-lingual text categorization. In International Conference on Theory and Practice of Digital Libraries, Berlin, Heidelberg, pp. 126-139.
Bentaallah, 2014, The use of wordnets for multilingual text categorization: A Comparative Study, ICWIT, 121
Che, W., Liu, Y., Wang, Y., Zheng, B., Liu, T., 2018. Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation. CoNLL 2018 - SIGNLL Conf. Comput. Nat. Lang. Learn. Proc. CoNLL 2018 Shar. Task Multiling. Parsing from Raw Text to Univers. Depend. 55–64. https://doi.org/10.18653/v1/K18-2005
Conneau, 2019, Unsupervised Cross-lingual Representation Learning at Scale, 31
Dahou, 2016, Word embeddings and convolutional neural network for Arabic sentiment classification, 2418
Davidson, 2017, Automated hate speech detection and the problem of offensive language, 512
Devlin, 2019, BERT: Pre-training of deep bidirectional transformers for language understanding, 4171
El-Alami, F.-Z., El Alaoui, S.O., En-Nahnahi, N., 2020. Deep Neural Models and Retrofitting for Arabic Text Categorization. International Journal of Intelligent Information Technologies (IJIIT). 16, 74–86. https://doi.org/10.4018/ijiit.2020040104
ElJundi, O., Antoun, W., El Droubi, N., Hajj, H., El-Hajj, W., Shaban, K., 2019. hULMonA: The Universal Language Model in Arabic 68–77. https://doi.org/10.18653/v1/w19-4608
Elnagar, 2020, Arabic text classification using deep learning models, I Inform. Process. Manage., 57, 102
Gonalves, T., Quaresma, P., 2010. Multilingual text classification through combination of monolingual classifiers. In Proceedings of the 4th Workshop on Legal Ontologies and Artificial Intelligence Techniques. 605, 29–38.
Howard, J., Ruder, S., 2018. Universal language model fine-tuning for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 328–339. https://doi.org/10.18653/v1/p18-1031
R. Kapila Satvika. Text Categorization on Multiple Languages Based On Classification Technique International Journal of Computer Science and Information Technologies. 7 3 2016 1578 1581
Kumar, R., Ojha, A. K., Malmasi, S., Zampieri, M., 2018. Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), Santa Fe, New Mexico, USA, pp. 1-11.
Lai, 2015
Lample, G. and Conneau, A., 2019. Cross-lingual language model pretraining. Advances in Neural Information Processing Systems (NeurIPS 2019). 32.
Lee, C. H., Yang, H. C., Ma, S. M., 2006. A novel multilingual text categorization system using latent semantic indexing. In First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC'06), Beijing, China, pp. 503-506. https://doi.org/10.1109/icicic.2006.214
Liu, 2019, Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing, 337, 325, 10.1016/j.neucom.2019.01.078
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. Stoyanov, V., 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Mandl, 2019, Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages
Mittal, 2015, Multilingual text classification, Int. J. Eng. Res. Technol. (IJERT), 4
Mubarak, H., Rashed, A., Darwish, K., Samih, Y., Abdelali, A., 2020. Arabic offensive language on twitter: Analysis and experiments. arXiv preprint arXiv:2004.02192.
Nowak, 2017, LSTM recurrent neural networks for short text and sentiment classification, 553
Peters, 2018
Prajapati, B.P., Garg, S., Panchal, M.H., 2009. Automated Text Categorization with Machine Learning and its Application in Multilingual Text Categorization. National Conference on Advance Computing - NCAC09, Vallabh Vidyanagar, Anand, Gujarat, India, pp. 204–209.
Rosenthal, S., Atanasova, P., Karadzhov, G., Zampieri, M., Nakov, P., 2020. A large-scale semi-supervised dataset for offensive language identification. arXiv preprint arXiv:2004.14454.
Sanh, V., Debut, L., Chaumond, J., Wolf, T., 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
Vaswani, 2017, 5999
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983.
Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z., Çöltekin, Ç., 2020. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) 1425–1447. arXiv preprint arXiv:2006.07235.
Zhou, C., Sun, C., Liu, Z., Lau, F., 2015. A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630.