Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance
Tóm tắt
This paper proposes a new two-phase approach to robust text detection by integrating the visual appearance and the geometric reasoning rules. In the first phase, geometric rules are used to achieve a higher recall rate. Specifically, a robust stroke width transform (RSWT) feature is proposed to better recover the stroke width by additionally considering the cross of two strokes and the continuousness of the letter border. In the second phase, a classification scheme based on visual appearance features is used to reject the false alarms while keeping the recall rate. To learn a better classifier from multiple visual appearance features, a novel classification method called double soft multiple kernel learning (DS-MKL) is proposed. DS-MKL is motivated by a novel kernel margin perspective for multiple kernel learning and can effectively suppress the influence of noisy base kernels. Comprehensive experiments on the benchmark ICDAR2005 competition dataset demonstrate the effectiveness of the proposed two-phase text detection approach over the state-of-the-art approaches by a performance gain up to 4.4% in terms of F-measure.
Từ khóa
Tài liệu tham khảo
G. Sahoo, T. Kumar, B. L. Raina, C. M. Bhatia. Text extraction and enhancement of binary images using cellular automata. International Journal of Automation and Computing, vol. 6, no. 3, pp. 254–260, 2009.
B. Epshtein, E. Ofek, Y. Wexler. Detecting text in natural scenes with stroke width transform. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 2963–2970, 2010.
L. Neumann, J. Matas. A method for text localization and recognition in real-world images. In Proceedings of the 10th Asian Conference on Computer Vision, Lecture Notes in Corputer Science, vol. 6494, Springer, Queenstown, New Zealand, pp. 770–783, 2010.
C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu. Detecting texts of arbitrary orientations in natural images. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1083–1090, 2012.
Y. C. Wei, C. H. Lin. A robust video text detection approach using SVM. Expert Systems with Applications, vol. 39, no. 12, pp. 10832–10840, 2012.
Y. Y. Qu, W. M. Liao, S. Lu, S. J. Wu. Hierarchical text detection: From word level to character level. In Proceedings of the 19th International Conference on Advances in Multimedia Modeling, Lecture Notes in Computer Science, Springer, Huangshan, China, vol. 7733 pp. 24–35, 2013.
V. N. M. Aradhya, M. S. Pavithra. An application of K-means clustering for improving video text detection. In Proceedings of International Symposium on Intelligent Informatics, Advances in Intelligent Systems and Computer, Springer, Channai, India, vol. 182, pp. 41–47, 2013.
C. Z. Shi, C. H. Wang, B. H. Xiao, Y. Zhang, S. Gao. Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition Letters, vol. 34, no. 2, pp. 107–116, 2013.
S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young. ICDAR 2003 robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition, IEEE, Edinburgh, Scotland, pp. 682–687, 2003.
J. Liang, D. Doermann, H. P. Li. Camera-based analysis of text and documents: A survey. International Journal of Document Analysis and Recognition, vol. 7, no. 2–3, pp. 83–104, 2005.
H. G. Zhang, K. Zhao, Y. Z. Song, J. Guo. Text extraction from natural scene image: A survey. Neurocomputing, vol. 122, pp. 310–323, 2013.
A. K. Jain, B. Yu. Automatic text location in images and video frames. Pattern Recognition, vol. 31, no. 12, pp. 2055–2076, 1998.
X. R. Chen, A. L. Yuille. Detecting and reading text in natural scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Washington DC, USA, pp. 366–373, 2004.
L. Neumann, R. Ewerth, B. Freisleben. Text detection in images based on unsupervised classification of high frequency wavelet coefficients. In Proceedings of International Conference on Pattern Recognition, IEEE, Cambridge, England, pp. 425–428, 2004.
L. Neumann, J. Matas. Real-time scene text localization and recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 3538–3545, 2012.
G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, vol. 5, pp. 27–72, 2004.
F. R. Bach, G. R. G. Lanckriet, M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference on Machine Learning, ACM, Banff, Alberta, Canada, 2004.
S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, vol. 7, pp. 1531–1565, 2006.
A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet. Simple MKL. Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008.
C. Cortes, M. Mohri, A. Rostamizadeh. L2 regularization for learning kernels. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, USA, pp. 109–116, 2009.
M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien. L p-norm multiple kernel learning. Journal of Machine Learning Research, vol. 12, pp. 953–997, 2011.
X. Xu, I. W. Tsang, D. Xu. Soft margin multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 749–761, 2013.
J. X. Xiao, J. Hays, K. A. Ehinger, A. Oliva, A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 3485–3492, 2010.
T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002.
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
E. Shechtman, M. Irani. Matching local self-similarities across images and videos. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Minneapolis, USA, pp. 1–8, 2007.
C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM, Pittsburgh, PA, USA, pp. 144–152, 1992.
Z. L. Xu, R. Jin, H. Q. Yang, I. King, M. R. Lyu. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on Machine Learning, Omnipress, Haifa, Israel, pp. 1175–1182, 2010.
M. Szafranski, Y. Grandvalet, A. Rakotomamonjy. Composite kernel learning. Machine Learning, vol. 79, no. 1–2, pp. 73–103, 2010.
S. Shalev-Shwartz, Y. Singer. Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, vol. 7, pp. 1567–1599, 2006.
S. M. Lucas. Text locating competition results. In Proceedings of the 8th International Conference on Document Analysis and Recognition, IEEE, Seoul, Korea, pp. 80–85, 2005.
S. Y. Yan, X. X. Xu, D. Xu, S. Lin, X. L. Li. Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 464–478, 2012.
C. C. Chang, C. J. Lin. Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Article 27, 2011.
C. Yi, Y. L. Tian. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011.