Efficient binarization technique for severely degraded document images

Springer Science and Business Media LLC - Tập 2 - Trang 153-161 - 2014
Brij Mohan Singh1, Mridula2
1Department of CSE, College of Engineering Roorkee, Roorkee, India
2Department of Earthquake Engineering, IIT Roorkee, Roorkee, India

Tóm tắt

Degradations in document images appear due to shadows, non-uniform illumination, ink bleed-through and blur caused by humidity. Thresholding of such document images either result in broken characters or detection of false texts. Numerous algorithms exist that can separate text and background efficiently in the textual regions of the document; but portions of background are mistaken as text in areas that hardly contain any text. This paper presents a way to overcome these problems by a robust binarization technique that recovers the text from a severely degraded document images and thereby increases the accuracy of optical character recognition systems. The proposed document recovery algorithm efficiently removes degradations from document images. Proposed work is based on the fusion of two well known binarization methods: Gatos et al. and Niblack, using dilation and logical AND operations. The results of our proposed binarization approach are seen to be better when compared to five existing well known approaches proposed by Otsu, Gatos et al., Niblack, Souvola et al., and Bernsen using four evaluations measures: Execution time, F-measure, PSNR, and NRM.

Tài liệu tham khảo

Lasmar AG, Kricha A, Essoukri N and Amara B (2006) A segmentation text/background method for degraded ancient Arabic manuscript, pp 1327–1331, IEEE He J, Do QDM, Downton AC and Kim JH (2005) A comparison of binarization methods for historical archive documents. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 538–542 Jindal M.K, Sharma RK and Lehal GS(2007) A study of different kinds of degradation in printed Gurmukhi script. In: Proceedings of the international conference on computing: theory and applications (ICCTA’07), pp 538–544, IEEE Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybernet 9(1):62–66 Trier OD, Jain AK (1995) Goal-directed evaluation of binarization methods. Pattern Anal Mach Intell IEEE Trans 17(12):1191–1201 Gatos B, Pratikakis I, Perantonis SJ (2006) Adaptive degraded document image binarization. Pattern Recognit 39:317–327 Niblack W (1986) An introduction to digital image processing. Prentice-Hall, Englewood Cliffs, pp 115–116 Sauvola J, Pietikainen M (2000) Adaptive document image thresholding. Pattern Recognit 33:225–236 Bernsen J (1986) Dynamic thresholding of grey-level images. In: Proceedings of the eighth ICPR, pp 1251–1255 Kittler J, Illingworth J (1985) On threshold selection using clustering criteria. IEEE Trans Syst Man Cybernet 15:652–655 Brink AD (1992) Thresholding of digital images using two-dimensional entropies. Pattern Recognit 25(8):803–808 Yan H (1996) Unified formulation of a class of image thresholding techniques. Pattern Recognit 29(12):2025–2032 Sahoo PK, Soltani S, Wong AKC (1988) A survey of thresholding techniques. Comput Vis Graph Image Process 41(2):233–260 Kim IK, Jung DW, Park RH (2002) Document image thresholding based on topographic analysis using a water flow model. Pattern Recognit 35:265–277 Yang J, Chen Y, Hsu W (1994) Adaptive thresholding algorithm and its hardware implementation. Pattern Recognit Lett 15(2):141–150 Parker JR, Jennings C and Salkauskas AG (1993) Thresholding using an illumination model. In: International conference on document analysis and recognition (ICDAR’93), pp. 270–273, IEEE Gonzalez RC, Woods RE, Eddins SL (2010) Digital image processing using MATLAB, 2nd edn. Mc Graw Hill Education, New Delhi, India The Library of Congress (http://memory.loc.gov/). Accessed Nov 2010 Holy Monastery of St. Catherine at Mount Sinai (http://www.sinaimonastery.com/). Accessed Nov 2010 Bodleian Library—University of Oxford, (http://www.bodley.ox.ac.uk/). Accessed Nov 2010 Users.iit.demokritos.gr/~ bgat/DIBCO2009/benchmark/. Accessed Nov 2010 Users.iit.demokritos.gr/~ bgat/H-DIBCO2010/benchmark/. Accessed Nov 2010 Utopia.duth.gr/~ ipratika/DIBCO2011/benchmark/. Accessed Nov 2010