MOrpho-LEXical analysis for correcting OCR-generated Arabic words (MOLEX)

Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition - Trang 461-466

T. Sari¹, M. Sellami¹

¹Laboratoire LRI, Département dE28099Informatique, Université Badji Mokhtar Annaba, Algeria

Tóm tắt

In this paper we present a contextual-based method for correcting Arabic words generated by OCR systems. This technique operates as a post-processor and it wants to be universal. It corrects substitution and rejection errors. The Arabic language properties are very useful in morpho-lexical analysis and therefore they are strongly exploited in the development of the method. The substitution errors, the most frequently committed ones by the OCR systems, are rewritten in production rules to be used by a rule-based system for correcting Arabic words. The first version of the developed method operates only at the morpho-lexical level, the extension to the other levels of language analysis is considered in perspectives.

Từ khóa

#Optical character recognition software #Dictionaries #Error correction #Hidden Markov models #Speech recognition #Production systems #Knowledge based systems #Natural language processing #Acoustics #Heart

Tài liệu tham khảo

ho, 1991, Word recognition with multi-level contextual knowledge, Proceed ICDAR'91, 905 10.1016/0306-4573(83)90045-6 10.1109/IJCNN.1991.155584 10.1002/(SICI)1097-4571(198703)38:2<133::AID-ASI8>3.0.CO;2-P trenkel, 1995, Arabic character recognition, Proceedings of the Symposium on Document Image Understanding Technology, 191 10.1016/0031-3203(90)90078-Y 10.1016/0031-3203(90)90071-R fink, 1986, The correction of ill-formed input using history-based expectation with application to speech understanding, Computer Linguist, 12, 13 contant, 1992, Exploratexte: Un Analyseur a? l'affu?t des erreurs grammaticales, Actes du Colloque Lexiquesgrammaires Compares de brucq, 1996, Repre?sentation de chai?nes de caracte?res par des chai?nes induites de Markov, Actes RFIA'96, 651 cheriet, 1998, Visual aspect of cursive arabic handwriting recognition, Proced Vision Interface VI'98, 262 10.1007/BF01889984 10.1016/S0031-3203(96)00078-7 jones, 1991, Integrating multiple knowledge sources in a bayesian ocr postprocessing, Proceed ICDAR'91, 925 kukick, 1988, Variations on a back-propagation name recognition net, Proceed Advanced Techn Conf, 2, 722 10.1145/146370.146380 laskri, 1995, Traitement automatique de la langue arabe en vue d'une traduction automatique des textes vers la langue franc?aise, Proc 3e?me JADT'95, 25 lefevre, 1992, Logiciel d'acce?s par voisinage a? un dictionnaire automatique du franc?ais courant, Actes de CNED'92, 200 10.1016/0031-3203(94)90166-X miled, 1997, Une me?thode rapide de reconnaissance de l'e?criture arabe manuscrite, 16e?me Colloque Trait sari, 2001, Proble?matique de la reconnaissance et de la correction des mots arabes, Actes Confe?rence Internationale sur l'Automatisation du Tre?sor de la Langue Arabe ATLA'01, 23 sellami, 1998, Contribution a? la reconnaissance de mots arabes manuscrits, CARI'98 Colloque Africain de Recherche en Informatique, 122 al-suwaiyel, 1991, On the entropy of arabic, The Arabian Journal of Science and Engineering, 16, 559 al badr, 1995, Survey and bibliography of arabic optical text recognition, Signal Process, 41, 49, 10.1016/0165-1684(94)00090-M 10.1016/S0262-8856(96)01119-5 abuhaiba, 1991, Cluster number estimation and skeleton refining algorithms for arabic characters, Arabian Journal for Science An Engineering (ASJE), 16, 519 10.1016/0031-3203(90)90070-2 10.1109/21.44052 amin, 1986, Machine recognition of multifont printed arabic texts, Proceed of ICPR'86, 1, 392 souilem, 1989, Un systeme d'enseignement assiste par ordinateur de la grammaire arbe S.E.A.G.A, Actes du IV Colloque International de Linguistique Linguistique Arabe et Informatique, 209 amin, 1982, Machine recognition of hand written arabic words by the irac ii system, Proc of 6th ICPR, 1, 34 souici, 0, Global recognition system for arabic literal amounts, ICCTA'99 10.1109/34.149585 ben amara, 1997, Application des phmms pour la reconnaissance de l'e?criture arabe imprime?e, JST'97 Francil, 389 10.1016/S0031-3203(97)00084-8

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA