Estimating the information gap between textual and visual representations

International Journal of Multimedia Information Retrieval - Tập 7 Số 1 - Trang 43-56 - 2018
Christian Henning1,2, Ralph Ewerth3,1
1Institute of Distributed Systems, and L3S Research Center, Leibniz Universität Hannover, Hannover, Germany
2Institute of Neuroinformatics, ETH Zurich, Zurich, Switzerland
3Department of Research and Development, Research Group Visual Analytics, Leibniz Information Centre for Science and Technology (TIB), Hannover, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Agosti M, Fuhr N, Toms E, Vakkari P (2014) Evaluation methodologies in information retrieval (Dagstuhl Seminar 13441). Dagstuhl Rep 3(10):92–126

Barnard K, Yanai K (2006) Mutual information of words and pictures. Inf Theory Appl 2:1–5

Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3(2):1107–1135

Bateman J (2014) Text and image: a critical introduction to the visual/verbal divide. Routledge, London

Chen X, Fang H, Lin T, Vedantam R, Gupta S, Dollár P, Zitnick L (2015) Microsoft COCO captions: data collection and evaluation server. arxiv:1504.00325

Crammer K, Singer Y (2002) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2(12):265–292

Eickhoff C, Teevan J, White R, Dumais S (2014) Lessons from the journey: a query log analysis of within-session learning. In: Proceedings of the 7th ACM international conference on web search and data mining, pp 223–232

Feng Y, Lapata M (2008) Automatic image annotation using auxiliary text information. In: Proceedings of Association for Computational Linguistics, vol 8, pp 272–280

Feng Y, Lapata M (2013) Automatic caption generation for news images. IEEE Trans Pattern Anal Mach Intell 35(4):797–812

Frome A, Corrado G, Shlens J, Bengio S, Dean J, Ranzato MA, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: Proceedings of neural information processing systems, vol 26, pp 2121–2129

Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: Proceedings of European conference on computer vision, vol 13, pp 529–545

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of neural information processing systems, vol 26, pp 2672–2680

Izadinia H, Sadeghi F, Divvala S, Hajishirzi H, Choi Y, Farhadi A (2015) Segment-phrase table for semantic segmentation, visual entailment and paraphrasing. In: Proceedings of the IEEE international conference on computer vision, pp 10–18

Karpathy A, Li F (2014) Deep visual-semantic alignments for generating image descriptions. arXiv:1412.2306

Karpathy A, Joulin A, Li F (2014) Deep fragment embeddings for bidirectional image sentence mapping. arXiv:1406.5679

Liu W, Tang X (2005) Learning an image-word embedding for image auto-annotation on the nonlinear latent space. In: Proceedings of ACM international conference on multimedia, vol 13, pp 451–454

Mao J, Xu W, Yang Y, Wang J, Yuille A (2014) Explain images with multimodal recurrent neural networks. arXiv:1410.1090

Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of neural information processing systems, vol 26, pp 3111–3119

Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng A (2011) Multimodal deep learning. In: Proceedings of international conference on machine learning, vol 28, pp 689–696

Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434

Ramisa A, Yan F, Moreno-Noguer F, Mikolajczyk K (2016) Breakingnews: Article annotation by image and text processing. arXiv arXiv:1603.07141

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. arXiv:1512.00567

Vakkari P (2016) Searching as learning: a systematization based on literature. J Inf Sci 42(1):7–18

Vinyals O, Toshev A, Bengio S, Erhan D (2014) Show and tell: A neural image caption generator. arXiv:1411.4555

Vinyals O, Toshev A, Bengio S, Erhan D (2016) Show and tell: lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663

Wu Q, Shen C, Liu L, Dick A, van den Hengel A (2016) What value do explicit high level concepts have in vision to language problems? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 203–212

Xue J, Du Y, Shui H (2015) Semantic correlation mining between images and texts with global semantics and local mapping. In: Proceedings of international conference on multimedia modeling, vol 8936, pp 427–435

Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3441–3450

Yanai K, Barnard K (2005) Image region entropy: a measure of visualness of web images associated with one concept. In: Proceedings of the annual ACM international conference on multimedia, vol 13, pp 419–422

Zhang Y, Schneider J, Dubrawski A (2008) Learning the semantic correlation: An alternative way to gain from unlabeled text. In: Proceedings of the international conference on neural information processing systems, vol 21, pp 1945–1952

Zhuang YT, Yang Y, Wu F (2008) Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimed 10(2):221–229