An encoder-decoder based framework for hindi image caption generation
Tóm tắt
Từ khóa
Tài liệu tham khảo
Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6077–6086
Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Dhir R, Mishra SK, Saha S, Bhattacharyya P (2019) A deep attention based framework for image caption generation in hindi language. Comput y Sist 23(3)
Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010) Every picture tells a story: generating sentences from images. In: European conference on computer vision, pp 15–29. Springer
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: European conference on computer vision, pp 529–545. Springer
He X, Deng L (2017) Deep learning for image-to-text generation: a technical overview. IEEE Signal Proc Mag 34(6):109–116
Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Boltzmann machines, neural networks, pp 405409
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899
Hossain M, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CSUR) 51(6):118
Isozaki H, Hirao T, Duh K, Sudoh K, Tsukada H (2010) Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 944–952. Association for Computational Linguistics, Cambridge. https://www.aclweb.org/anthology/D10-1092
Jaffe A (2017) Generating image descriptions using multilingual data. In: Proceedings of the second conference on machine translation, pp 458–464
Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In: Proceedings of the IEEE international conference on computer vision, pp 2407–2415
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In: Proceedings of the 31st international conference on international conference on machine learning - vol 32, ICML’14, pp II–595–II–603. JMLR.org. http://dl.acm.org/citation.cfm?id=3044805.3044959
Laskar SR, Singh RP, Pakray P, Bandyopadhyay S (2019) English to Hindi multi-modal neural machine translation and Hindi image captioning. In: Proceedings of the 6th workshop on asian translation. Association for Computational Linguistics, Hong Kong, pp 62–67. https://doi.org/10.18653/v1/D19-5205. https://www.aclweb.org/anthology/D19-5205
Liu M, Hu H, Li L, Yu Y, Guan W (2020) Chinese image caption generation via visual attention and topic modeling. IEEE Transactions on Cybernetics
Ma S, Han Y (2016) Describing images by feeding lstm with structural words. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME.2016.7552883
Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: Advances in neural information processing systems, pp 1143–1151
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, ACL ’02, pp 311–318. Association for Computational Linguistics, Stroudsburg. https://doi.org/10.3115/1073083.1073135
Parida S, Bojar O, Dash SR (2019) Hindi visual genome: a dataset for multimodal english-to-hindi machine translation. Computación y Sistemas. In print. Presented at CICLing 2019, La Rochelle, France
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
Ren Z, Wang X, Zhang N, Lv X, Li LJ (2017) Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 290–298
Sanayai Meetei L, Singh TD, Bandyopadhyay S (2019) WAT2019: English-Hindi translation on Hindi visual genome dataset. In: Proceedings of the 6th workshop on asian translation. https://doi.org/10.18653/v1/D19-5224. https://www.aclweb.org/anthology/D19-5224. Association for Computational Linguistics, Hong Kong, pp 181–188
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh A, Meetei LS, Singh TD, Bandyopadhyay S (2021) Generation and evaluation of hindi image captions of visual genome. In: Maji AK, Saha G, Das S, Basu S, Tavares JMRS (eds) Proceedings of the international conference on computing and communication systems. Springer Singapore, Singapore, pp 65–73. https://doi.org/10.1007/978-981-33-4084-8_7
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimedia Comput Commun Appl 14(2s):40:1–40:20. https://doi.org/10.1145/3115432
Wang M, Li S, Yang X, Luo C (2016) A parallel-fusion rnn-lstm architecture for image caption generation. In: 2016 IEEE international conference on image processing (ICIP), pp 4448–4452
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yang Y, Teo CL, Daumé H III, Aloimonos Y (2011) Corpus-guided sentence generation of natural images. In: Proceedings of the conference on empirical methods in natural language processing, pp 444–454. Association for Computational Linguistics