DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

Shilpa Mahajan1, Rajneesh Rani1, Karan Trehan1
1Dr BR Ambedkar National Institute of Technology, Jalandhar, India

Tóm tắt

The recognition and detection of multioriented text from textual natural scene images are still challenging in the computer vision community. The segmentation on either word level or character level is a vital step in the entire end-to-end performance of the scene text recognition system. Many academicians and researchers have done work in the prominent field of segmenting the words or characters from complex document images as well as handwritten images for various non-Indian scripts. In this paper, we extensively presented a deep learning-based architecture named DELIGHT-Net which is derived from the general UNet architecture to segment the text at the word level from natural scene images. The method is mainly proposed to segment the Devanagari, Gurumukhi, and English scenic words from complete images collected from day-to-day life. To achieve this, we have introduced a new dataset, i.e., National Institute of Technology Jalandhar-Word Segmentation (NITJ-WS) which has around 2200 text blocks extracted from 1500 natural images containing unilingual, bilingual, and trilingual text. The benchmark comparative assessment of our dataset is performed with the proposed model and two state-of-the-art models, i.e., UNet and ResUNet. Statistical and visual results are evaluated using different evaluation parameters, which depict the efficiency of the proposed model. Some possible future directions are also recommended in the manuscript. We hope that our work is a stepping stone for academicians in the field of natural scene text recognition.

Tài liệu tham khảo

Alghamdi A, Alluhaybi D, Almehmadi D, Alameer K, Siddeq SB, Alsubait T (2021) Text segmentation of historical Arabic handwritten manuscripts using projection profile. In: 2021 national computing colleges conference (NCCC), pp 1–6. https://doi.org/10.1109/NCCC49330.2021.9428836 Amara M, Zidi K, Ghedira K, Zidi S (2016) New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. In: International conference on hybrid intelligent systems. Springer, pp 167–176 Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35:875–893. https://doi.org/10.1016/S0031-3203(01)00081-4 Basavaraju HT, Aradhya VN, Pavithra MS, Guru DS, Bhateja V (2021) Arbitrary oriented multilingual text detection and segmentation using level set and Gaussian mixture model. Evol Intell 14:881–894. https://doi.org/10.1007/s12065-020-00472-y Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and Bangla text extraction from natural scene images. In: 2009 10th international conference on document analysis and recognition, pp 171–175. https://doi.org/10.1109/ICDAR.2009.178 Chaitra Y, Dinesh R (2022) An impact of radon transforms and filtering techniques for text localization in natural scene text images. In: ICT with intelligent applications: proceedings of ICTIS 2021, vol 1. Springer, pp 563–573 Chaitra Y, Dinesh R, Gopalakrishna M, Prakash BA (2021) Deep-cnntl: text localization from natural scene images using deep convolution neural network with transfer learning. Arab J Sci Eng. https://doi.org/10.1007/s13369-021-06309-9 Chaitra Y, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of yolov5 on text detection and recognition system using tesseractocr in images/video frames. In: 2022 IEEE international conference on data science and information system (ICDSIS). IEEE, pp 1–6 Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: Proceedings: international conference on pattern recognition. IEEE, pp 3604–3609. https://doi.org/10.1109/ICPR.2018.8546066 Dhok SB (2018) Multilingual character segmentation and recognition schemes for Indian document images. IEEE Access 6:10603–10617. https://doi.org/10.1109/ACCESS.2018.2795104 Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114 Firdaus FI, Khumaini A, Utaminingrum F (2017) Arabic letter segmentation using modified connected component labeling. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 392–397 Jillani G, Hussain J, Yasmin M, Sharif M, Lawrence S (2018) A novel machine learning approach for scene text extraction. FuturE Gener Comput Syst 87:328–340. https://doi.org/10.1016/j.future.2018.04.074 Karaoglu S, Tao R, Gevers T, Smeulders AWM (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimed 19:1063–1076. https://doi.org/10.1109/TMM.2016.2638622 Kaur RP, Jindal MK, Kumar M (2021) Text and graphics segmentation of newspapers printed in Gurmukhi script: a hybrid approach. Vis Comput 37:1637–1659. https://doi.org/10.1007/s00371-020-01927-0 Khare V, Shivakumara P, Chan CS, Lu T, Meng LK, Woon HH, Blumenstein M (2019) A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst Appl 131:219–239 Kumar S, Gupta R, Khanna N, Chaudhury S, Joshi SD (2007) Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Trans Image Process 16:2117–2128. https://doi.org/10.1109/TIP.2007.900098 Liao M, Pang G, Huang J, Hassner T, Bai X (2020) Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XI 16. Springer, pp 706–722 Liu X (2005) An edge-based text region extraction algorithm for indoor mobile robot navigation. In: IEEE international conference mechatronics and automation, 2005, vol 2, pp 701–706. https://doi.org/10.1109/ICMA.2005.1626635 Liu X (2006) Multiscale edge-based text extraction from complex images. Xiaoqing Liu and Jagath Samarabandu The University of Western Ontario Department of Electrical & Computer Engineering. Neural Computing and Applications, pp 1721–1724 Lu T, Dooms A (2021) Probabilistic homogeneity for document image segmentation. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107591 Ma J, Zhang H, Shan Y, Qie X, Xu X, Qi Z (2022) BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR, pp 19152–19162 Madi B, Droby A, El-Sana J (2022) Textline alignment on the image domain. Int J Doc Anal Recognit 25:415–427 Mahajan S, Rani R (2018) Text extraction from Indian and non-Indian natural scene images: a review. In: 2018 first international conference on secure cyber computing and communication (ICSCCC). IEEE, pp 584–588. https://doi.org/10.1109/ICSCCC.2018.8703369 Mahajan S, Rani R (2019) A decade on script identification from natural images/videos: a review. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT), pp 1–5. https://app.dimensions.ai/details/publication/pub.1124551290. https://doi.org/10.1109/icict46931.2019.8977630 Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377 Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 312–316. https://doi.org/10.1109/ICDAR.2005.76 Mechi O, Mehri M, Ingold R, Amara NEB (2019) Text line segmentation in historical document images using an adaptive U-net architecture. In: Proceedings of the international conference on document analysis and recognition, ICDAR, vol 1, pp 369–374. https://doi.org/10.1109/ICDAR.2019.00066 Milosevic N, Gregson C, Hernandez R, Nenadic G (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22:55–78 Nguyen DD (2022) Tablesegnet: a fully convolutional network for table detection and segmentation in document images. Int J Doc Anal Recognit 25:1–14 Papavassiliou V, Stafylakis T, Katsouros V, Carayannis G (2010) Handwritten document image segmentation into text lines and words. Pattern Recogn 43:369–377. https://doi.org/10.1016/j.patcog.2009.05.007 Peng D, Jin L, Wu Y, Wang Z, Cai M (2019) A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 25–30. https://doi.org/10.1109/ICDAR.2019.00014 Qomariyah F, Utaminingrum F, Mahmudy WF (2017) The segmentation of printed Arabic characters based on interest point. J Telecommun Electron Comput Eng 9:19–24 Raj H, Ghosh R (2014) Devanagari text extraction from natural scene images. In: International conference on advances in computing,communications and informatics (ICACCI), pp 513–517 Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional Poisson model. In: Proceedings of the IEEE 2017 international conference on computing methodologies and communication, pp 1136–1141 Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627. https://doi.org/10.17485/ijst/v14i7.2146 Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627 Rong X, Yi C, Tian Y (2020) Unambiguous scene text segmentation with referring expression comprehension. IEEE Trans Image Process 29:591–601. https://doi.org/10.1109/TIP.2019.2930176 Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241 Saleem SI, Abdulazeez AM, Orman Z (2021) A new segmentation framework for Arabic handwritten text using machine learning techniques. Comput Mater Contin 68:2727–2754. https://doi.org/10.32604/cmc.2021.016447 Wang C, Zhao S, Zhu L, Luo K, Guo Y, Wang J, Liu S (2021) Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Trans Image Process 30:8212–8221. https://doi.org/10.1109/TIP.2021.3113157 Xu X, Qi Z, Ma J, Zhang H, Shan Y, Qie X (2022) Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19152–19162 Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12045–12055 Yang H, Wu S, Member S, Deng C, Lin W, Member S (2015) Scale and orientation invariant text segmentation for born-digital compound images. IEEE Trans Cybern 45:519–533. https://doi.org/10.1109/TCYB.2014.2330657 Zhang C, Tao Y, Du K, Ding W, Wang B, Liu J, Wang W (2021) Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving. IEEE Trans Artif Intell 3:297–308. https://doi.org/10.1109/tai.2021.3116216 Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15:749–753