Linguistic integration information in the AABATAS Arabic text analysis system

Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition - Trang 389-394

S. Kanoun¹, A. Ennaji¹, Y. Lecourtier¹, A.M. Alimi²

¹Perception System Information Laboratory (PSI), University of Rouen, Mont Saint Aignan, France

²Research Group on Intelligent Machines (REGIM), University of Sfax, Sfax, Tunisia

Tóm tắt

An Arabic text analysis system called AABATAS (affixal approach-based Arabic text analysis system) is proposed. AABATAS recognizes and categorizes the words while identifying their morphological and grammatical characteristics. It is based on a new approach for Arabic word recognition called affixal approach. This affixal approach is guided by the structural properties of language. A dynamic decomposition-recognition mechanism is used in our system and leads to generate a set of reliable solutions for each word. This mechanism attempts to identify, the word basic morphemes: the prefix, the infix, the suffix and the root contrary to the existing approaches that are usually based on the recognition of the whole word or the pseudo-word or the letter. In this paper, we briefly present the general characteristics of Arabic texts as well as a succinct survey of the existing approaches used for their recognition. We then describe the structural properties of the Arabic language and the two systems based on these last properties. The first one concerns a word recognition process and the second is devoted to text analysis. We finally show two experimental results; one on a data set of 545 words and another on a text example.

Từ khóa

#Text analysis #Text recognition #Character recognition #Writing #Vocabulary #Laboratories #Machine intelligence #Optical character recognition software #Optical sensors #Databases

Tài liệu tham khảo

kanoun, 0, Proposition d'une approche affixale pour la reconnaissance de l'e?criture arabe, ICISP'2001, 500 duda, 1973, Pattern Classification and Scene Analysis, 114 kanoun, 2000, Une approche de discrimination arabe /latin, imprime? /manuscrit, CIFED'2000 Lyon France, 121 kanoun, 2000, Script identification for arabic and latin, printed and andwritten documents, DAS 2000, 159 10.1109/2.144444 10.1109/ICDAR.1995.602039 10.1016/S0031-3203(00)00051-0 10.1016/0031-3203(90)90069-W ben hamadou, 1992, Correction orthographique des textes arabes a? partir d'une analyse affixale robuste des chai?nes affecte?es, 12 e?me Confe?rence sur l'Intelligence Artificielle les Syste?mes Experts et le langage naturel 10.1016/S0031-3203(99)00227-7 10.1007/BF01219591 10.1109/21.44052 10.1109/ICDAR.2001.953757 10.1016/S0031-3203(97)00084-8 10.1109/TPAMI.1987.4767970 ben hamadou, 1993, Ve?rification et correction automatiques par analyse affixale des textes ecrits en langage naturel: le cas de l'arabe non voyelle?, The?se de Doctorat ES-Sciences ameur, 0, Approche globale pour la reconnaissance des mots manuscrits arabes, CNED'94, 151 10.1109/ICDAR.1995.599012 al-badr, 1995, Survey and bibliography of arabic optical text recognition, Signal Processing, 41, 49, 10.1016/0165-1684(94)00090-M 10.1109/ICDAR.1997.620576 10.3115/991365.991449 ben amara, 0, Utilisation des mode?les markoviens en reconnaissance de l'e?criture arabe: etat de l'art, CIFED'2000, 181

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA