Linguistic integration information in the AABATAS Arabic text analysis system
Tóm tắt
An Arabic text analysis system called AABATAS (affixal approach-based Arabic text analysis system) is proposed. AABATAS recognizes and categorizes the words while identifying their morphological and grammatical characteristics. It is based on a new approach for Arabic word recognition called affixal approach. This affixal approach is guided by the structural properties of language. A dynamic decomposition-recognition mechanism is used in our system and leads to generate a set of reliable solutions for each word. This mechanism attempts to identify, the word basic morphemes: the prefix, the infix, the suffix and the root contrary to the existing approaches that are usually based on the recognition of the whole word or the pseudo-word or the letter. In this paper, we briefly present the general characteristics of Arabic texts as well as a succinct survey of the existing approaches used for their recognition. We then describe the structural properties of the Arabic language and the two systems based on these last properties. The first one concerns a word recognition process and the second is devoted to text analysis. We finally show two experimental results; one on a data set of 545 words and another on a text example.
Từ khóa
#Text analysis #Text recognition #Character recognition #Writing #Vocabulary #Laboratories #Machine intelligence #Optical character recognition software #Optical sensors #DatabasesTài liệu tham khảo
kanoun, 0, Proposition d'une approche affixale pour la reconnaissance de l'e?criture arabe, ICISP'2001, 500
duda, 1973, Pattern Classification and Scene Analysis, 114
kanoun, 2000, Une approche de discrimination arabe /latin, imprime? /manuscrit, CIFED'2000 Lyon France, 121
kanoun, 2000, Script identification for arabic and latin, printed and andwritten documents, DAS 2000, 159
10.1109/2.144444
10.1109/ICDAR.1995.602039
10.1016/S0031-3203(00)00051-0
10.1016/0031-3203(90)90069-W
ben hamadou, 1992, Correction orthographique des textes arabes a? partir d'une analyse affixale robuste des chai?nes affecte?es, 12 e?me Confe?rence sur l'Intelligence Artificielle les Syste?mes Experts et le langage naturel
10.1016/S0031-3203(99)00227-7
10.1007/BF01219591
10.1109/21.44052
10.1109/ICDAR.2001.953757
10.1016/S0031-3203(97)00084-8
10.1109/TPAMI.1987.4767970
ben hamadou, 1993, Ve?rification et correction automatiques par analyse affixale des textes ecrits en langage naturel: le cas de l'arabe non voyelle?, The?se de Doctorat ES-Sciences
ameur, 0, Approche globale pour la reconnaissance des mots manuscrits arabes, CNED'94, 151
10.1109/ICDAR.1995.599012
al-badr, 1995, Survey and bibliography of arabic optical text recognition, Signal Processing, 41, 49, 10.1016/0165-1684(94)00090-M
10.1109/ICDAR.1997.620576
10.3115/991365.991449
ben amara, 0, Utilisation des mode?les markoviens en reconnaissance de l'e?criture arabe: etat de l'art, CIFED'2000, 181