Detection of artificial and scene text in images and video frames
Tóm tắt
Textual information in images and video frames constitutes a valuable source of high-level semantics for multimedia indexing and retrieval systems. Text detection is the most crucial step in a multimedia text extraction system and although it has been extensively studied the past decade still, it does not exist a generic architecture that would work for artificial and scene text in multimedia content. In this paper we propose a system for text detection of both artificial and scene text in images and video frames. The system is based on a machine learning stage which uses an Random Forest classifier and a highly discriminative feature set produced by using a new texture operator called Multilevel Adaptive Color edge Local Binary Pattern (MACeLBP). MACeLBP describes the spatial distribution of color edges in multiple adaptive levels of contrast. Then, a gradient-based algorithm is applied to achieve distinction among text lines as well as refinement in the localization of the text lines. The whole algorithm is situated in a multiresolution framework to achieve invariance to scale for the detection of text lines. Finally, an optional connected-component step segments text lines into words based on the distances between the resulting components. The experimental results are produced by applying a concise evaluation methodology and prove the superior performance achieved by the proposed text detection system for artificial and scene text in images and video frames.
Tài liệu tham khảo
Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. ACM/Springer Multime´d Sys 8:69–81
Sobottka K, Bunke H, Kronenberg H (1999) Identification of text on colored book and journal covers. International conference on document analysis and recognition, pp 57–63
Wang K, Kangas JA (2003) Character location in scene images from digital camera. Pattern Recognit 36(10):2287–2299
Sato T, Kanade T, Hughes E, and Smith M (1998) Video ocr for digital news archives, IEEE workshop on content-based access of image and video databases, pp 52–60
Anthimopoulos M, Gatos B, Pratikakis I (2007) Multiresolution text detection in video frames. International conference on computer vision theory and applications, pp 161–166
Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411
Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13(1):87–99
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transforms, IEEE conference on computer vision and pattern recognition, San Francisco
Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Machine Intell 22(4):385–392
Crandall D, Antani S, Kasturi R (2003) Extraction of special effects caption text events from digital video. Int J Document Anal Recognit 5(2–3):138–157
Lim Y.K, Choi S.H, and Lee S.W (2000) Text extraction in mpeg compressed video for content-based indexing. International conference on pattern recognition, pp 409–412
Gargi U, Crandall D.J, Antani S, Gandhi T, Keener R, Kasturi R (1999) A system for automatic text detection in video. International conference on document analysis and recognition, pp 29–32
Goto H (2008) Redefining the DCT-based feature for scene text detection: Analysis and comparison of spatial frequency-based features. Int J Document Anal Recognit 11(1):1–8
Chen D, Odobez J-M, Thiran J-P (2004) A localization/verification scheme for finding text in images and videos based on contrast independent features and machine learning methods. Image Commun 19(3):205–217
Ye Q, Huang Q, Gao W, Zhao D (2005) Fast and robust text detection in images and video frames. Image Vision Comput 23(6):565–576
Jung C, Liu Q, Kim J (2009) A stroke filter and its application to text localization. Pattern Recogn Lett 30(2):114–122
Anthimopoulos M, Gatos B, Pratikakis I (2010) A two-stage scheme for text detection in video images. Image Vision Comput 28(9):1413–1426
Ye Q, Jiao J, Huang J, Yu H (2007) Text detection and restoration in natural scene images. J Vis Commun Image Represent 18(6):504–513
Ji R, Xu P, Yao H, Zhang Z, Sun X, Liu T (2008) Directional correlation analysis of local Haar binary pattern for text detection. IEEE International Conference on Multimedia & Expo, pp 885–888
A. Ekin (2006) Information based overlaid text detection by classifier fusion. IEEE international conference on acoustics, speech and signal processing, pp II-753–II-756
Jung K (2001) Neural network-based text location in color images. Pattern Recogn Lett 22(14):1503–1515
Kim KI, Jung K, Park SH, Kim HJ (2001) Support vector machine-based text detection in digital video. Pattern Recogn 34(2):527–529
Wolf C and Jolion J-M (2004) Model Based Text Detection in Images and Videos: a Learning Approach. Technical Report LIRIS-RR-2004-13 Laboratoire d’Informatique en Images et Systemes d’Information, INSA de Lyon, France
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits and Systems for Video Technol 12(4):256–268
Li H, Doermann D, Kia O (2000) Automatic Text Detection and Tracking in Digital Video. IEEE Trans Image Process 9(1):147–156
Chen X.R, Yuille A.L (2004) Detecting and reading text in natural scenes. IEEE computer society conference on computer vision and pattern recognition, pp 366–373
Viola PA, Jones MJ (2004) Robust real-time face detection. Int J Comp Vision 57(2):137–154
Ojala T, Pietikainen M, Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recogn 29(1):51–59
Breiman L (2001) Random forests. Machine Learn 45(1):5–32
Tang Y, Krasse S, He Y, Yang W, Alperovitch D (2008) Support vector machines and random forests modeling for spam senders behavior analysis. GLOBECOM, pp 2174–2178
Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns, 11th IEEE international conference on computer vision, pp 1–8
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE transactions on systems. Man Cybern 9(1):62–66
Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions, ICDAR, pp 682–687
Wolf C, Jolion J (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280–296