Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Một Tập Corpus Kinh Quran Mới Giàu Thông Tin Morphosyntactical

International Journal of Speech Technology - Tập 19 - Trang 339-346 - 2016

Imad Zeroual¹, Abdelhak Lakhouaja¹

¹Computer Sciences Laboratory Faculty of Sciences, Mohammed First University, Oujda, Morocco

Tóm tắt

Hiện nay, lượng tài liệu có chú thích về tiếng Ả Rập là rất hạn chế. Điều này khiến chúng tôi phải đóng góp vào việc phong phú hóa tài nguyên tập corpus tiếng Ả Rập. Trong bối cảnh này, chúng tôi quyết định bắt đầu làm việc với những văn bản chính xác và được lựa chọn cẩn thận. Do đó, việc bắt đầu với văn bản tiếng Ả Rập trong Kinh Quran là lựa chọn tốt nhất cho nỗ lực này. Hơn nữa, các tài nguyên ngôn ngữ có chú thích, như Tập Corpus Kinh Quran, rất quan trọng đối với các nhà nghiên cứu trong tất cả các lĩnh vực xử lý ngôn ngữ tự nhiên tiếng Ả Rập. Theo hiểu biết của chúng tôi, các tập corpus tiếng Ả Rập Kinh Quran hiện có chỉ đến từ Đại học Leeds, Đại học Jordan và Đại học Haifa. Thật đáng tiếc, những tập corpus này có nhiều vấn đề và không chứa đủ thông tin ngữ pháp và cú pháp. Để xây dựng một tập Corpus Kinh Quran mới, chúng tôi đã sử dụng một kỹ thuật bán tự động, bao gồm việc sử dụng hệ thống morphsyntactic của các từ tiếng Ả Rập chuẩn “AlKhalil Morpho Sys” kèm theo một quy trình xử lý thủ công. Kết quả của công việc này là chúng tôi đã xây dựng một Tập Corpus Kinh Quran mới giàu thông tin morphosyntactical.

Từ khóa

#Corpus #tiếng Ả Rập #Kinh Quran #xử lý ngôn ngữ tự nhiên #thông tin morphosyntactical.

Tài liệu tham khảo

Albared, M., Omar, N., & Ab Aziz, M. J. (2011). Developing a competitive HMM Arabic POS tagger using small training corpora. In intelligent information and database systems (pp. 288–296). Springer Berlin Heidelberg. Atwell, E., Brierley, C., Dukes, K., Sawalha, M., & Sharaf, A. B. (2011). An artificial intelligence approach to Arabic and Islamic content on the internet. In Proceedings of NITS 3rd National Information Technology Symposium. Boudchiche, M., Mazroui, A., Lakhouaja, A., & Ould Bebah, M. (2014, February 8). L’Analyseur Morphosyntaxique AlKhalil Morpho Sys 2. 1ère Journée Doctorale Nationale sur L’Ingénierie de la Langue Arabe (JDILA’14). Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Bebah, M., & Shoul, M. (2010). Alkhalil Morpho SYS1: A Morphosyntactic Analysis System for Arabic Texts. In International Arab Conference on Information Technology. Brierley, C., Sawalha, M., & Atwell, E. (2012). Open-source boundary-annotated Corpus for Arabic speech and language processing. In LREC (pp. 1011–1016). Buckwalter, T. (2004, August). Issues in Arabic orthography and morphology analysis. In proceedings of the workshop on computational approaches to Arabic script-based languages (pp. 31–34). Association for computational linguistics. Dror, J., Shaharabani, D., Talmon, R., & Wintner, S. (2004). Morphological analysis of the Qur’an. Literary and Linguistic Computing, 19(4), 431–452. Dukes, K., Atwell, E., & Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. Language Resources and Evaluation, 47(1), 33–62. Dukes, K., & Buckwalter, T. (2010, March). A dependency treebank of the Quran using traditional Arabic grammar. In Informatics and systems (INFOS), 2010 The 7th International conference on (pp. 1–7). IEEE. Dukes, K., & Habash, N. (2010, May). Morphological annotation of Quranic Arabic. In LREC Elmahdy, M., Gruhn, R., Minker, W., & Abdennadher, S. (2009). Survey on common Arabic language forms from a speech recognition point of view. In International conference on Acoustics (NAG-DAGA), Rotterdam, Netherlands (pp. 63–66). Fabri, R., Gasser, M., Habash, N., Kiraz, G., & Wintner, S. (2014). Linguistic introduction: The orthography, morphology and syntax of semitic languages. In natural language processing of semitic languages (pp. 3–41). Springer Berlin Heidelberg. Habash, N., Diab, M. T., & Rambow, O. (2012). Conventional orthography for dialectal arabic. In LREC (pp. 711–718). Habash, N., & Roth, R. M. (2009, August). Catib: The columbia arabic treebank. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 221–224). Association for computational linguistics. KSU–Electronic Mosshaf project “Ayat”. http://quran.ksu.edu.sa/ Larkey, L. S., Ballesteros, L., & Connell, M. E. (2007). Light stemming for Arabic information retrieval. In Arabic computational morphology (pp. 221–243). Springer Netherlands. Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004, September). The penn arabic treebank: Building a large-scale annotated arabic Corpus. In NEMLAR conference on Arabic language resources and tools (pp. 102–109). Marton, Y., Habash, N., & Rambow, O. (2011, June). Improving Arabic dependency parsing with form-based and functional morphological features. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-Volume 1 (pp. 1586–1596). Association for computational linguistics. Sawalha, M., & Atwell, E. (2013). A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging. Word Structure, 6(1), 43–99. Sawalha, M., Brierley, C., & Atwell, E. (2012). Predicting phrase breaks in classical and modern standard Arabic text. In LREC (pp. 3868–3872). Sawalha, M., Brierley, C., & Atwell, E. (2014). Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur’an dataset for machine learning (version 2.0). LRE-REL2, 42. Sharaf, A. B. M., & Atwell, E. (2012). QurAna: Corpus of the Quran annotated with pronominal anaphora. In LREC (pp. 130–137). Smrž, O., & Hajic, J. (2006). The other Arabic treebank: Prague dependencies and functions (p. 104). Arabic Computational Linguistics: Current Implementations. CSLI Publications. Watson, J. C. (2007). The phonology and morphology of Arabic. Oxford university press. Zarrabi-Zadeh, H. (2007–2014). Tanzil Quran project. http://tanzil.net/ Zeroual, I., & Lakhouaja, A. (2014, November). A New Quranic Corpus rich in morphological information. In Procedings of the 5th International Conference on Arabic language processing CITALA2014, Oujda, Morocco. Zitouni, I. (Ed.). (2014). Natural language processing of semitic languages (pp. 299–334). Springer. Zitouni, I., & Benajiba, Y. (2014). Aligned-parallel-corpora based semi-supervised learning for Arabic mention detection. IEEE/ACM transactions on audio, speech and language processing (TASLP), 22(2), 314–324.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]