Boosting convolutional image captioning with semantic content and visual relationship

Displays - Tập 70 - Trang 102069 - 2021

Cong Bai¹, Anqi Zheng¹, Yuan Huang¹, Xiang Pan¹, Nan Chen²

¹Zhejiang University of Technology, HangZhou 310000, China

²Qilu Normal University, JiNan 250013, China

Tài liệu tham khảo

LeCun, 1998, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278, 10.1109/5.726791 I. Sutskever, J. Martens, G.E. Hinton, Generating text with recurrent neural networks, in: ICML, 2011. Hochreiter, 1997, Long short-term memory, Neural Comput., 9, 1735, 10.1162/neco.1997.9.8.1735 Hossain, 2019, A Comprehensive Survey of Deep Learning for Image Captioning, ACM Comput. Surv., 51, 1, 10.1145/3295748 K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio, Show, attend and tell: Neural image caption generation with visual attention, in: International conference on machine learning, 2015, pp. 2048–2057. I. Schwartz, A. Schwing, T. Hazan, High-order attention models for visual question answering, in: Advances in Neural Information Processing Systems, 2017, pp. 3664–3674. Anderson, 2018, Bottom-up and top-down attention for image captioning and visual question answering, 6077 J. Gehring, M. Auli, D. Grangier, D. Yarats, Y.N. Dauphin, Convolutional sequence to sequence learning, arXiv preprint arXiv:1705.03122 (2017). Gu, 2017, An empirical study of language cnn for image captioning, 1222 Aneja, 2018, Convolutional image captioning, 5561 Q. Wang, A.B. Chan, Cnn+ cnn: Convolutional decoders for image captioning, arXiv preprint arXiv:1805.09019 (2018). Dauphin, 2017, Language modeling with gated convolutional networks, 933 J.H. Kim, G.S. Hong, B.G. Kim, D.P. Dogra, deepgesture: Deep learning-based gesture recognition scheme using motion sensors, Displays 55 (2018) 38–45. Advances in Smart Content-Oriented Display Technology. A.K. Dash, S.K. Behera, D.P. Dogra, P.P. Roy, Designing of marker-based augmented reality learning environment for kids using convolutional neural network architecture, Displays 55 (2018) 46–54. Advances in Smart Content-Oriented Display Technology. J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, A. Yuille, Deep captioning with multimodal recurrent neural networks (m-RNN), in: 3rd International Conference on Learning Representations, ICLR 2015, 2015. Karpathy, 2015, Deep visual-semantic alignments for generating image descriptions, 3128 Jia, 2015, Guiding the long-short term memory model for image caption generation Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882 (2014). A. Conneau, H. Schwenk, L. Barrault, Y. Lecun, Very deep convolutional networks for text classification, arXiv preprint arXiv:1606.01781 (2016). Divvala, 2014, Learning everything about anything: Webly-supervised visual concept learning D. Teney, L. Liu, A. Van Den Hengel, Graph-structured representations for visual question answering, in: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. Shi, 2019, Explainable and explicit visual reasoning over scene graphs, 8376 Liu, 2019, Learning to assemble neural module tree networks for visual grounding, 4673 Wu, 2016, What value do explicit high level concepts have in vision to language problems? Yao, 2017, Boosting image captioning with attributes, 4894 Kipf, 2017, Semi-supervised classification with graph convolutional networks Yao, 2018, Exploring visual relationship for image captioning, 684 Yang, 2019, Auto-encoding scene graphs for image captioning, 10685 T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in: European conference on computer vision, Springer, 2014, pp. 740–755. Krishna, 2017, Visual genome: Connecting language and vision using crowdsourced dense image annotations, Int. J. Comput. Vision, 123, 32, 10.1007/s11263-016-0981-7 Dai, 2017, Towards diverse and natural image descriptions via a conditional gan, 2970 Papineni, 2002, Bleu: a method for automatic evaluation of machine translation, 311 Denkowski, 2014, Meteor universal: Language specific translation evaluation for any target language, 376 C.Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81. Vedantam, 2015, Cider: Consensus-based image description evaluation, 4566 P. Anderson, B. Fernando, M. Johnson, S. Gould, Spice: Semantic propositional image caption evaluation, in: European Conference on Computer Vision, Springer, 2016, pp. 382–398. X. Chen, H. Fang, T.Y. Lin, R. Vedantam, S. Gupta, P. Dollár, C.L. Zitnick, Microsoft coco captions: Data collection and evaluation server, arXiv preprint arXiv:1504.00325 (2015).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA