Contour feature learning for locating text in natural scene images
Tóm tắt
Text is a rich and precise information source for understanding natural scene imagery and video. Text detection and localization improves the ability to understand text. Text detection and recognition faces numerous challenges such as noise, blur, distortion and variation. Though substantial research work has been done in the recent years, this area is still considered interesting by the research community and is open for improvements. Text detection and localization demands considerable large training datasets, computational ability and a prolonged training process. In this work, we have tried to address this issue by considering efficient algorithms to extract text contours and by transforming the image into a list of contour features. This list of contour features then be given to a convolutional neural network. We have shown that this would reduce training effort, ease text component classification and improve the results. After training with 300 images of MSRA-TD500, we are able attain a precision of 0.84, recall of 0.67 and a f-measure of 0.75. Also, contours are more effective than conventional rectangular bounding boxes in precisely localizing the text components.
Tài liệu tham khảo
Cao Y et al (2020) FDTA: fully convolutional scene text detection with text attention. IEEE Access. 8:155441–155449. https://doi.org/10.1109/ACCESS.2020.3018784 (@ARTICLE{8812908, author={Liao, Mingh}})
Chen H et al. (2011) Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions. In: 2011 18th IEEE international conference on image processing. pp. 2609–2612 IEEE. https://doi.org/10.1109/ICIP.2011.6116200
Coates A et al. (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of the international conference on document analysis and recognition, ICDAR. pp. 440–445. https://doi.org/10.1109/ICDAR.2011.95
Epshtein B et al. (2010) Detecting text in natural scenes with stroke width transform. In: IEEE computer society conference on computer vision and pattern recognition. pp. 2963–2970 IEEE. https://doi.org/10.1109/CVPR.2010.5540041
Gary B (2008) The OpenCV library. Dr Dobb’s J Softw Tools 25(2236121):120–123
Harris CR et al (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2
He T et al (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541. https://doi.org/10.1109/TIP.2016.2547588
Jung K et al (2004) Text information extraction in images and video: a survey. Pattern Recognit 37(5):977–997. https://doi.org/10.1016/j.patcog.2003.10.012
Li H, Lu H (2020) At-text: assembling text components for efficient dense scene text detection. Futur Internet 12(11):1–14. https://doi.org/10.3390/fi12110200
Liao M et al (2021) Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell 43(2):532–548. https://doi.org/10.1109/TPAMI.2019.2937086
Liao M et al (2018) Rotation-sensitive regression for oriented scene text detection. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00619
Liu Y et al (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90:337–345. https://doi.org/10.1016/j.patcog.2019.02.002
Long S et al (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184. https://doi.org/10.1007/s11263-020-01369-0
Malik J et al (2001) Contour and texture analysis for image segmentation. Int J Comput Vis 43(1):7–27. https://doi.org/10.1023/A:1011174803800
Shekar BH et al. (2014) Discrete wavelet transform and gradient difference based approach for text localization in videos. In: Proceedings—2014 5th international conference on digital signal processing. ICSIP 2014, pp 280–284. https://doi.org/10.1109/ICSIP.2014.50
Shivakumara P, Pal U (2021) Text detection in images. Cognitively inspired video text processing. Springer, Singapore, pp 95–138. https://doi.org/10.1007/978-981-16-7069-5_5
Wan Z et al. (202) TextScanner: reading characters in order for robust scene text recognition. In: AAAI 2020—34th AAAI conference on artificial intelligence, pp 12120–12127. https://doi.org/10.1609/aaai.v34i07.6891
Wang X et al. (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. 2019-June, pp 6442–6451. https://doi.org/10.1109/CVPR.2019.00661
Wu V, Manmatha R (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224–1229. https://doi.org/10.1109/34.809116
Yang Q et al. (2018) Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. In: IJCAI international joint conference on artificial intelligence. pp 1071–1077. https://doi.org/10.24963/ijcai.2018/149
Yao C et al (2012) Detecting texts of arbitrary orientations in natural images. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 8:1083–1090. https://doi.org/10.1109/CVPR.2012.6247787
Yao C et al. (2016) Scene text detection via holistic, multi-channel prediction. pp 1–10
Yin XC et al (2016) Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans Image Process 25(6):2752–2773. https://doi.org/10.1109/TIP.2016.2554321
Zhang Y et al (2021) A scene text detector based on deep feature merging. Multimed Tools Appl 80(19):29005–29016. https://doi.org/10.1007/s11042-021-11101-w
Zhang Z et al. (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp. 4159–4167. https://doi.org/10.1109/CVPR.2016.451
Zhong Y et al (1995) Locating text in complex color images. Pattern Recognit 28(10):1523–1535. https://doi.org/10.1016/0031-3203(95)00030-4
Zhou X et al. (2017) EAST: an efficient and accurate scene text detector. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017. pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283
Zhu Y et al (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36. https://doi.org/10.1007/s11704-015-4488-0