A spatial-temporal approach for video caption detection and recognition

IEEE Transactions on Neural Networks - Tập 13 Số 4 - Trang 961-971 - 2002
Xiaoou Tang1, Xinbo Gao2, Jianzhuang Liu1, Hongjiang Zhang3
1Department of Information Engineering, Chinese University of Hong Kong, Hong Kong, China
2Chinese University of Hong Kong, Hong Kong, China
3Microsoft Research Asia, Beijing, China

Tóm tắt

We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classifier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recognition accuracy from 13% to 86% on a set of news video captions. As the first attempt on Chinese video-caption recognition, our experiment results are very encouraging.

Từ khóa

#Indexing #Neural networks #Optical character recognition software #Character recognition #Shape measurement #Layout #Data mining #Video compression #Gunshot detection systems #Fuzzy neural networks

Tài liệu tham khảo

10.1016/0031-3203(95)00030-4 10.1007/BF01210504 10.1109/ICME.2000.871054 10.1109/2.493456 10.1145/108844.108939 10.1109/ICDAR.1999.791724 10.1109/72.870048 10.1109/34.809116 10.1145/263690.263766 10.1109/ICME.2000.871481 10.1109/ICASSP.1999.757478 10.1016/S0031-3203(98)00067-3 10.1117/12.304625 10.1109/ICIP.1999.817127 10.1006/jvci.1996.0029 kim, 1996, recognition of vehicle license plate using a genetic algorithm based segmentation, Proc ICIP, 661 kurakake, 1997, recognition and visual feature matching of text region in video for conceptual indexing, Proc SPIE Storage Retrieval Image Video Databases 3022, 368 10.1109/ICME.2000.871472 10.1142/S0218001495000043 li, 1998, text extraction, enhancement and ocr in digital video, Proc 3rd IAPR Workshop, 363 10.1109/83.817607 10.1109/ICPR.1998.711219 10.1109/ICME.2000.871044 10.1007/s005300050140 10.1109/69.755615 10.1109/34.566817 10.1109/CVPR.1997.609414 10.1109/CVPR.1997.609372 10.1109/ICASSP.2000.859306 fukunaga, 1990, Introduction to statistical pattern recognition 10.1109/ICDAR.1999.791884 10.1109/ICDAR.1999.791717 10.1109/IVL.1999.781133 lienhart, 1996, automatic text recognition in digital videos, Proc SPIE Image Video Processing IV 2666, 180 10.1109/21.278989 maybury, 1997, segmentation, content extraction and visualization of broadcast news video using multistream analysis, Proc AAAI Spring Symp, 1 10.1109/34.273729 nagasaka, 1992, automatic video indexing and full video search for object appearances, IFIP Trans Visual Database Syst II 10.1016/0262-8856(95)01057-2 10.1109/72.238310