MOrpho-LEXical analysis for correcting OCR-generated Arabic words (MOLEX)
Tóm tắt
In this paper we present a contextual-based method for correcting Arabic words generated by OCR systems. This technique operates as a post-processor and it wants to be universal. It corrects substitution and rejection errors. The Arabic language properties are very useful in morpho-lexical analysis and therefore they are strongly exploited in the development of the method. The substitution errors, the most frequently committed ones by the OCR systems, are rewritten in production rules to be used by a rule-based system for correcting Arabic words. The first version of the developed method operates only at the morpho-lexical level, the extension to the other levels of language analysis is considered in perspectives.
Từ khóa
#Optical character recognition software #Dictionaries #Error correction #Hidden Markov models #Speech recognition #Production systems #Knowledge based systems #Natural language processing #Acoustics #HeartTài liệu tham khảo
ho, 1991, Word recognition with multi-level contextual knowledge, Proceed ICDAR'91, 905
10.1016/0306-4573(83)90045-6
10.1109/IJCNN.1991.155584
10.1002/(SICI)1097-4571(198703)38:2<133::AID-ASI8>3.0.CO;2-P
trenkel, 1995, Arabic character recognition, Proceedings of the Symposium on Document Image Understanding Technology, 191
10.1016/0031-3203(90)90078-Y
10.1016/0031-3203(90)90071-R
fink, 1986, The correction of ill-formed input using history-based expectation with application to speech understanding, Computer Linguist, 12, 13
contant, 1992, Exploratexte: Un Analyseur a? l'affu?t des erreurs grammaticales, Actes du Colloque Lexiquesgrammaires Compares
de brucq, 1996, Repre?sentation de chai?nes de caracte?res par des chai?nes induites de Markov, Actes RFIA'96, 651
cheriet, 1998, Visual aspect of cursive arabic handwriting recognition, Proced Vision Interface VI'98, 262
10.1007/BF01889984
10.1016/S0031-3203(96)00078-7
jones, 1991, Integrating multiple knowledge sources in a bayesian ocr postprocessing, Proceed ICDAR'91, 925
kukick, 1988, Variations on a back-propagation name recognition net, Proceed Advanced Techn Conf, 2, 722
10.1145/146370.146380
laskri, 1995, Traitement automatique de la langue arabe en vue d'une traduction automatique des textes vers la langue franc?aise, Proc 3e?me JADT'95, 25
lefevre, 1992, Logiciel d'acce?s par voisinage a? un dictionnaire automatique du franc?ais courant, Actes de CNED'92, 200
10.1016/0031-3203(94)90166-X
miled, 1997, Une me?thode rapide de reconnaissance de l'e?criture arabe manuscrite, 16e?me Colloque Trait
sari, 2001, Proble?matique de la reconnaissance et de la correction des mots arabes, Actes Confe?rence Internationale sur l'Automatisation du Tre?sor de la Langue Arabe ATLA'01, 23
sellami, 1998, Contribution a? la reconnaissance de mots arabes manuscrits, CARI'98 Colloque Africain de Recherche en Informatique, 122
al-suwaiyel, 1991, On the entropy of arabic, The Arabian Journal of Science and Engineering, 16, 559
al badr, 1995, Survey and bibliography of arabic optical text recognition, Signal Process, 41, 49, 10.1016/0165-1684(94)00090-M
10.1016/S0262-8856(96)01119-5
abuhaiba, 1991, Cluster number estimation and skeleton refining algorithms for arabic characters, Arabian Journal for Science An Engineering (ASJE), 16, 519
10.1016/0031-3203(90)90070-2
10.1109/21.44052
amin, 1986, Machine recognition of multifont printed arabic texts, Proceed of ICPR'86, 1, 392
souilem, 1989, Un systeme d'enseignement assiste par ordinateur de la grammaire arbe S.E.A.G.A, Actes du IV Colloque International de Linguistique Linguistique Arabe et Informatique, 209
amin, 1982, Machine recognition of hand written arabic words by the irac ii system, Proc of 6th ICPR, 1, 34
souici, 0, Global recognition system for arabic literal amounts, ICCTA'99
10.1109/34.149585
ben amara, 1997, Application des phmms pour la reconnaissance de l'e?criture arabe imprime?e, JST'97 Francil, 389
10.1016/S0031-3203(97)00084-8