A spatial-temporal approach for video caption detection and recognition
Tóm tắt
We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classifier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recognition accuracy from 13% to 86% on a set of news video captions. As the first attempt on Chinese video-caption recognition, our experiment results are very encouraging.
Từ khóa
#Indexing #Neural networks #Optical character recognition software #Character recognition #Shape measurement #Layout #Data mining #Video compression #Gunshot detection systems #Fuzzy neural networksTài liệu tham khảo
10.1016/0031-3203(95)00030-4
10.1007/BF01210504
10.1109/ICME.2000.871054
10.1109/2.493456
10.1145/108844.108939
10.1109/ICDAR.1999.791724
10.1109/72.870048
10.1109/34.809116
10.1145/263690.263766
10.1109/ICME.2000.871481
10.1109/ICASSP.1999.757478
10.1016/S0031-3203(98)00067-3
10.1117/12.304625
10.1109/ICIP.1999.817127
10.1006/jvci.1996.0029
kim, 1996, recognition of vehicle license plate using a genetic algorithm based segmentation, Proc ICIP, 661
kurakake, 1997, recognition and visual feature matching of text region in video for conceptual indexing, Proc SPIE Storage Retrieval Image Video Databases 3022, 368
10.1109/ICME.2000.871472
10.1142/S0218001495000043
li, 1998, text extraction, enhancement and ocr in digital video, Proc 3rd IAPR Workshop, 363
10.1109/83.817607
10.1109/ICPR.1998.711219
10.1109/ICME.2000.871044
10.1007/s005300050140
10.1109/69.755615
10.1109/34.566817
10.1109/CVPR.1997.609414
10.1109/CVPR.1997.609372
10.1109/ICASSP.2000.859306
fukunaga, 1990, Introduction to statistical pattern recognition
10.1109/ICDAR.1999.791884
10.1109/ICDAR.1999.791717
10.1109/IVL.1999.781133
lienhart, 1996, automatic text recognition in digital videos, Proc SPIE Image Video Processing IV 2666, 180
10.1109/21.278989
maybury, 1997, segmentation, content extraction and visualization of broadcast news video using multistream analysis, Proc AAAI Spring Symp, 1
10.1109/34.273729
nagasaka, 1992, automatic video indexing and full video search for object appearances, IFIP Trans Visual Database Syst II
10.1016/0262-8856(95)01057-2
10.1109/72.238310