A spatial-temporal approach for video caption detection and recognition

IEEE Transactions on Neural Networks - Tập 13 Số 4 - Trang 961-971 - 2002

Xiaoou Tang¹, Xinbo Gao², Jianzhuang Liu¹, Hongjiang Zhang³

¹Department of Information Engineering, Chinese University of Hong Kong, Hong Kong, China

²Chinese University of Hong Kong, Hong Kong, China

³Microsoft Research Asia, Beijing, China

Tóm tắt

We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classifier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recognition accuracy from 13% to 86% on a set of news video captions. As the first attempt on Chinese video-caption recognition, our experiment results are very encouraging.

Từ khóa

#Indexing #Neural networks #Optical character recognition software #Character recognition #Shape measurement #Layout #Data mining #Video compression #Gunshot detection systems #Fuzzy neural networks

Tài liệu tham khảo

10.1016/0031-3203(95)00030-4 10.1007/BF01210504 10.1109/ICME.2000.871054 10.1109/2.493456 10.1145/108844.108939 10.1109/ICDAR.1999.791724 10.1109/72.870048 10.1109/34.809116 10.1145/263690.263766 10.1109/ICME.2000.871481 10.1109/ICASSP.1999.757478 10.1016/S0031-3203(98)00067-3 10.1117/12.304625 10.1109/ICIP.1999.817127 10.1006/jvci.1996.0029 kim, 1996, recognition of vehicle license plate using a genetic algorithm based segmentation, Proc ICIP, 661 kurakake, 1997, recognition and visual feature matching of text region in video for conceptual indexing, Proc SPIE Storage Retrieval Image Video Databases 3022, 368 10.1109/ICME.2000.871472 10.1142/S0218001495000043 li, 1998, text extraction, enhancement and ocr in digital video, Proc 3rd IAPR Workshop, 363 10.1109/83.817607 10.1109/ICPR.1998.711219 10.1109/ICME.2000.871044 10.1007/s005300050140 10.1109/69.755615 10.1109/34.566817 10.1109/CVPR.1997.609414 10.1109/CVPR.1997.609372 10.1109/ICASSP.2000.859306 fukunaga, 1990, Introduction to statistical pattern recognition 10.1109/ICDAR.1999.791884 10.1109/ICDAR.1999.791717 10.1109/IVL.1999.781133 lienhart, 1996, automatic text recognition in digital videos, Proc SPIE Image Video Processing IV 2666, 180 10.1109/21.278989 maybury, 1997, segmentation, content extraction and visualization of broadcast news video using multistream analysis, Proc AAAI Spring Symp, 1 10.1109/34.273729 nagasaka, 1992, automatic video indexing and full video search for object appearances, IFIP Trans Visual Database Syst II 10.1016/0262-8856(95)01057-2 10.1109/72.238310

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA