Script and nature differentiation for Arabic and Latin text images

S. Kanoun1, A. Ennaji1, Y. Lecourtier1, A.M. Alimi2
1Perception System Information Laboratory (PSI), University of Rouen, Mont Saint Aignan, France
2Research Group on Intelligent Machines (REGIM), Machines (REGIM), ENIS, DGE, Sfax, Tunisia

Tóm tắt

A method for Arabic and Latin text block differentiation for printed and handwritten scripts is proposed. This method is based on a morphological analysis for each script at the text block level and a geometrical analysis at the line and the connected component level. In this paper, we present a brief survey, of existing methods used for scripts differentiation as well as a general characteristics of Arabic and Latin scripts. Then, We describe our method for the differentiation of these last scripts. We finally show two experimental results on two different data sets. 400 text blocks constitute the first one and 335 text blocks compose the second.

Từ khóa

#Text analysis #Handwriting recognition #Laboratories #Machine intelligence #Optical character recognition software #Natural languages #Optical devices #Optical sensors #Conferences #Feature extraction

Tài liệu tham khảo

10.1109/ICIP.1995.537663 10.1109/34.689305 tao, 2001, Discrimination of oriental and euramerican scripts using fractal feature, ICDAR'01, 1115 10.1142/S0218001498000063 10.1109/34.584100 10.1109/ICDAR.2001.953956 10.1109/ICDAR.1999.791873 bennasri, 2000, Arabic script preprocessing and application to postal addresses, 74 lee, 1996, Language identification in complex, unoriented, and degraded document images, DAS'1996, 76 kanoun, 2000, Une approche de discrimination arabe /latin, imprime? /manuscrit, CIFED'2000, 121 hochberg, 1999, Script and language identification for handwritten document images, IJDAR, 2, 45, 10.1007/s100320050036 10.1109/34.574802 10.1016/S0031-3203(97)00143-X 10.1109/ICDAR.2001.953896 kanoun, 2000, Script identification for arabic and latin, printed and andwritten documents, DAS 2000, 159