Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Comic MTL: tối ưu hóa học tập đa nhiệm cho phân tích hình ảnh truyện tranh
Tóm tắt
Phương pháp phân tích hình ảnh truyện tranh thường đề xuất nhiều thuật toán hoặc mô hình cho nhiều nhiệm vụ khác nhau như phát hiện bảng truyện, nhân vật (cơ thể và khuôn mặt), phân đoạn khung thoại, nhận diện văn bản, v.v. Trong nghiên cứu này, chúng tôi nhằm mục đích giảm thời gian xử lý cho phân tích hình ảnh truyện tranh bằng cách đề xuất một mô hình có khả năng học nhiều nhiệm vụ, được gọi là Comic MTL, thay vì sử dụng một mô hình cho mỗi nhiệm vụ. Ngoài các nhiệm vụ phát hiện và phân đoạn, chúng tôi tích hợp nhiệm vụ phân tích quan hệ giữa khung thoại và các nhân vật vào mô hình Comic MTL. Các thí nghiệm được thực hiện trên các tập dữ liệu công khai DCM772 và eBDtheque, chứa các chú thích cho bảng truyện, khung thoại, nhân vật, cũng như các mối quan hệ giữa khung thoại và nhân vật. Chúng tôi cho thấy mô hình Comic MTL có khả năng phát hiện các mối quan hệ giữa khung thoại và người nói (nhân vật trong truyện tranh) và xử lý các nhiệm vụ khác như phát hiện bảng truyện, nhân vật cũng như phân đoạn khung thoại với kết quả khả quan.
Từ khóa
#phân tích hình ảnh truyện tranh #học đa nhiệm #phát hiện nhân vật #phân đoạn khung thoại #mối quan hệ giữa nhân vật và khung thoạiTài liệu tham khảo
Arai, K., Tolle, H.: Method for automatic e-comic scene frame extraction for reading comic on mobile devices. In: 7th International Conference on Information Technology: New Generations, ITNG, pp. 370–375. IEEE Computer Society, Washington DC, USA (2010)
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. 4(6), 669–676 (2011)
Aramaki, Y., Matsui, Y., Yamasaki, T., Aizawa, K.: Text detection in manga by combining connected-component-based and region-based classifications. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2901–2905 (2016)
Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4, 87 (2018)
Baxter, J.: A model of inductive bias learning. J. Artif. Int. Res. 12(1), 149–198 (2000). http://dl.acm.org/citation.cfm?id=1622248.1622254
Bingel, J., Sogaard, A.: Identifying beneficial task relations for multi-task learning in deep neural networks. In: EACL (2017)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). https://doi.org/10.1023/A:1007379606734
Chu, W.T., Cheng, W.C.: Manga-specific features and latent style model for manga style analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1332–1336 (2016)
Chu, W.T., Li, W.W.: Manga facenet: Face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412–415. ACM (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR09 (2009)
Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Fujino, S., Mori, N., Matsumoto, K.: Recognizing the order of four-scene comics by evolutionary deep learning. In: Distributed Computing and Artificial Intelligence, pp. 136–144 (2015)
Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: A representative database of comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1145–1149 (2013)
Hashimoto, K., Xiong, C., Tsuruoka, Y., Socher, R.: A joint many-task model: Growing a neural network for multiple nlp tasks. In: EMNLP (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017)
He, Z., Zhou, Y., Wang, Y., Tang, Z.: Sren: Shape regression network for comic storyboard extraction. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, pp. 4937–4938 (2017)
He, Z., Zhou, Y., Wang, Y., Wang, S., Lu, X., Tang, Z., Cai, L.: An end-to-end quadrilateral regression network for comic panel extraction. In: ACM Multimedia (2018)
Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and Speech Balloon Extraction from Comic Books. 2012 10th IAPR International Workshop on Document Analysis Systems pp. 424–428 (2012)
Huang, Z., Li, J., Siniscalchi, S.M., Chen, I.F., Wu, J., Lee, C.H.: Rapid adaptation for deep neural networks through multi-task learning. In: INTERSPEECH (2015)
In, Y., Oie, T., Higuchi, M., Kawasaki, S., Koike, A., Murakami, H.: Fast frame decomposition and sorting by contour tracing for mobile phone comic images. Int. J. Syst. Appl. Eng. Dev. 5(2), 216–223 (2011)
Kaiser, L., Gomez, A.N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., Uszkoreit, J.: One model to learn them all. CoRR abs/1706.05137 (2017)
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 7482–7491 (2018)
Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3306–3313 (2012)
Li, L., Wang, Y., Tang, Z., Gao, L.: Automatic comic page segmentation based on polygon detection. Multimed. Tools Appl. 69(1), 171–197 (2014)
Liu, X., Li, C., Zhu, H., Wong, T.T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput. 32(4), 501–511 (2016)
Liu, X., Wang, Y., Tang, Z.: A clump splitting based method to localize speech balloons in comics. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 901–905 (2015)
Matsui, Y., Ito, K., Aramaki, Y., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using Manga109 dataset. CoRR abs/1510.04389 (2015)
Nguyen, N., Rigaud, C., Burie, J.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Nguyen, N.V., Rigaud, C., Burie, J.: Comic characters detection using deep learning. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, MANPU 2017, Kyoto, Japan, November 9–15, 2017, pp. 41–46 (2017)
Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Obispo, S.L., Kuboi, T.: Element detection in Japanese comic book panels (2014)
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR abs/1803.08670 (2018). arXiv:1803.08670
Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method for manga. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14, pp. 1125–1128. ACM, New York (2014)
Plank, B., Alonso, H.M.: When is multitask learning effective? semantic sequence prediction under varying data conditions. In: EACL (2017)
Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: Computers Helping People with Special Needs, pp. 471–478. Springer (2012)
Qin, X., Zhou, Y., He, Z., Wang, Y., Tang, Z.: A faster r-cnn based method for comic characters face detection. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 1074–1080 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates Inc, Red Hook (2015)
Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, 2017, Kyoto, Japan, November 9-15, pp. 29–34 (2017)
Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: Graphic Recognition. Current Trends and Challenges: 11th International Workshop, GREC 2015, Nancy, France, pp. 133–147. Cham (2017)
Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. Int. J. Doc. Anal. Recogn. 18(3), 199–221 (2015)
Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: An active contour model for speech balloon detection in comics. In: Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1240–1244 (2013)
Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: Automatic text localisation in scanned comic books. In: Proceedings of the 8th International Conference on Computer Vision Theory and Applications (VISAPP) (2013)
Rigaud, C., Thanh, N.L., Burie, J.., Ogier, J.., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355 (2015)
Rigaud, C., Tsopze, N., Burie, J.C., Ogier, J.M.: Robust frame and text extraction from comic books. In: Graphics Recognition. New Trends and Challenges, vol. 7423, pp. 129–138. Springer, Berlin (2013)
Stommel, M., Merhej, L.I., Müller, M.G.: Segmentation-free detection of comic panels. In: Computer Vision and Graphics, pp. 633–640. Springer (2012)
Sun, W., Burie, J.C., Ogier, J.M., Kise, K.: Specific comic character detection using local feature matching. In: 12th International Conference on Document Analysis and Recognition, pp. 275–279. Washington, DC (2013)
Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI’07, pp. 2885–2890 (2007)
Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. 87–D(6), 1370–1376 (2004)
Zamir, A.R., Sax, A., Shen, W.B., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3712–3722 (2018)
Zhao, W., Wang, B., Ye, J., Yang, M., Zhao, Z., Luo, R., Qiao, Y.: A multi-task learning approach for image captioning. In: IJCAI (2018)