Linguistic integration information in the AABATAS Arabic text analysis system

S. Kanoun1, A. Ennaji1, Y. Lecourtier1, A.M. Alimi2
1Perception System Information Laboratory (PSI), University of Rouen, Mont Saint Aignan, France
2Research Group on Intelligent Machines (REGIM), University of Sfax, Sfax, Tunisia

Tóm tắt

An Arabic text analysis system called AABATAS (affixal approach-based Arabic text analysis system) is proposed. AABATAS recognizes and categorizes the words while identifying their morphological and grammatical characteristics. It is based on a new approach for Arabic word recognition called affixal approach. This affixal approach is guided by the structural properties of language. A dynamic decomposition-recognition mechanism is used in our system and leads to generate a set of reliable solutions for each word. This mechanism attempts to identify, the word basic morphemes: the prefix, the infix, the suffix and the root contrary to the existing approaches that are usually based on the recognition of the whole word or the pseudo-word or the letter. In this paper, we briefly present the general characteristics of Arabic texts as well as a succinct survey of the existing approaches used for their recognition. We then describe the structural properties of the Arabic language and the two systems based on these last properties. The first one concerns a word recognition process and the second is devoted to text analysis. We finally show two experimental results; one on a data set of 545 words and another on a text example.

Từ khóa

#Text analysis #Text recognition #Character recognition #Writing #Vocabulary #Laboratories #Machine intelligence #Optical character recognition software #Optical sensors #Databases

Tài liệu tham khảo

kanoun, 0, Proposition d'une approche affixale pour la reconnaissance de l'e?criture arabe, ICISP'2001, 500 duda, 1973, Pattern Classification and Scene Analysis, 114 kanoun, 2000, Une approche de discrimination arabe /latin, imprime? /manuscrit, CIFED'2000 Lyon France, 121 kanoun, 2000, Script identification for arabic and latin, printed and andwritten documents, DAS 2000, 159 10.1109/2.144444 10.1109/ICDAR.1995.602039 10.1016/S0031-3203(00)00051-0 10.1016/0031-3203(90)90069-W ben hamadou, 1992, Correction orthographique des textes arabes a? partir d'une analyse affixale robuste des chai?nes affecte?es, 12 e?me Confe?rence sur l'Intelligence Artificielle les Syste?mes Experts et le langage naturel 10.1016/S0031-3203(99)00227-7 10.1007/BF01219591 10.1109/21.44052 10.1109/ICDAR.2001.953757 10.1016/S0031-3203(97)00084-8 10.1109/TPAMI.1987.4767970 ben hamadou, 1993, Ve?rification et correction automatiques par analyse affixale des textes ecrits en langage naturel: le cas de l'arabe non voyelle?, The?se de Doctorat ES-Sciences ameur, 0, Approche globale pour la reconnaissance des mots manuscrits arabes, CNED'94, 151 10.1109/ICDAR.1995.599012 al-badr, 1995, Survey and bibliography of arabic optical text recognition, Signal Processing, 41, 49, 10.1016/0165-1684(94)00090-M 10.1109/ICDAR.1997.620576 10.3115/991365.991449 ben amara, 0, Utilisation des mode?les markoviens en reconnaissance de l'e?criture arabe: etat de l'art, CIFED'2000, 181