Xây dựng kho dữ liệu đa phương tiện để tạo hình ảnh từ góc nhìn

Springer Science and Business Media LLC - Tập 1 - Trang 1-16 - 2019
Samir Elloumi1, Jihad Mohamad AlJa’am2, Jezia Zakraoui2
1Faculty of Sciences of Tunis, LR11ES14, University of Tunis El Manar, Tunis, Tunisia
2Computer Science and Engineering Department, Qatar University, Doha, Qatar

Tóm tắt

Kho dữ liệu đa phương tiện rất hữu ích cho các hoạt động giáo dục vì nó cung cấp nhiều hình ảnh minh họa giúp quá trình học tập và hiểu văn bản trở nên dễ dàng hơn. Trong bài báo này, chúng tôi đề xuất xây dựng một kho dữ liệu đa phương tiện từ các hình ảnh đã thu thập bằng kỹ thuật trích xuất đối tượng. Sau đó, chúng tôi gán các chú thích tiếng Ả Rập cho tất cả các đối tượng đã được trích xuất. Những đối tượng đã được trích xuất này sẽ được sử dụng để tạo ra các cảnh mới có thể minh họa hiệu quả cho những sự kiện quan trọng nhất trong một câu chuyện Ả Rập. Do đó, chúng tôi mở rộng khái niệm tạo hình ảnh theo cách tiếp cận của mình như là một nhiệm vụ xây dựng các hình ảnh mới dựa trên một bộ công cụ và một tập hợp các đối tượng đã được trích xuất như là kho dữ liệu đa phương tiện của các hành vi động vật phổ biến. Các kết quả sơ bộ của chúng tôi cho thấy các cảnh được tạo ra từ các đối tượng đơn lẻ đã cung cấp một sự hiểu biết hợp lý về những sự kiện chính trong các câu chuyện cũng như một bố cục hình ảnh mạch lạc cho tất cả các đối tượng đơn lẻ. Ngoài ra, sự đa dạng và độ chính xác của hình ảnh các đối tượng đơn lẻ trong lĩnh vực động vật đã cho thấy tác động lớn đến việc tạo cảnh mới, cả theo cách thủ công lẫn động.

Từ khóa

#kho dữ liệu đa phương tiện #trích xuất đối tượng #chú thích tiếng Ả Rập #tạo hình ảnh #hành vi động vật

Tài liệu tham khảo

Carney RN, Levin JR (2002) Pictorial illustrations still improve students’ learning from text. J R Educ Psychol Rev 14:5–26 Lin P, Huang Y, Chen C (2018) Exploring imaginative capability and learning motivation difference through picture e-book. IEEE Access 6:63416–63425 Ramisa A, Yan F, Moreno-Noguer F, Mikolajczyk K (2016) Breaking news: article annotation by image and text processing. ArXiv e-prints Zakraoui J, Saleh M, Ja’am JA (2019) Text-to-picture tools, systems, and approaches: a survey. Multimed Tools Appl 1–27 Agrawal R, Gollapudi S, Kannan A, Kenthapadi K (2011) Enriching textbooks with images. In: Proceedings of the 20th ACM international conference on information and knowledge management, Glasgow, pp 1847–1856 Jain P, Darbari H, Bhavsar VC (2014) Vishit: a visualizer for hindi text. In: Proceedings—2014 4th international conference on communication systems and network technologies, Bhopal, pp 886–890 Aramini S, Ardizzone E, Mazzola G (2015) Automatic illustration of short texts via web images. In: Proceedings of the 6th international conference on information visualization theory and applications (IVAPP-2015) Delgado D, Magalhães J, Correia N (2010) Automated illustration of news stories. In: Proceedings of the 2010 IEEE fourth international conference on semantic computing, Pittsburgh, pp 73–78 Goldberg AB, Rosin J, Zhu X, Dyer CR (2009) Toward text-to-picture synthesis. In: Proceedings of the NIPS 2009 symposium on assistive machine learning for people with disabilities Li H, Tang J, Li G. Chua T-S (2008) Word2Image: towards visual interpretation of words. In: MM’08—proceedings of the 2008 ACM international conference on multimedia, with co-located symposium and workshops, Vancouver, pp 813–816 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems 27, Curran Associates, Inc.,, pp 2672–2680 Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: ICML, New York, USA Fu A, Yiju H (2017) Text-to-image generation using multi-instance stackgan Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) StackGAN ++: realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916, 2017 Jia D, Wei D, Richard S, Li-Jia L, Kai L, LiF-F (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Miami, FL ImageCLEF/LifeCLEF Cross—multimedia retrieval in CLEF [Online]. https://www.imageclef.org/. Accessed 8 Feb 2018 MSCOCO [Online]. http://cocodataset.org/#home. Accessed 1 Mar 2018 Micah H, Young Peter Y, Julia H (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899 Peter Y, Alice L, Micah H, Julia H (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist 2:67–78 Bernardi R, Cakici R, Elliott D, Erdem A, Erdem E, Ikizler-Cinbis N, Keller F, Muscat A, Plank B (2016) Automatic description generation from images: a survey of models, datasets, and evaluation measures. J Artif Intell Res 55:409–442 Joshi D, Wang JZ, Li J (2004) The story picturing engine: finding elite images to illustrate a story using mutual reinforcement. In: Proceedings of the 6th ACM SIGMM international workshop on multimedia information retrieval Terragalleria [Online]. https://www.terragalleria.com/. Accessed 1 Dec 2018 Art Museum Image Consortium [Online]. http://www.amico.org/home.html. Accessed 1 Feb 2019 Zhu X, Goldberg AB, Eldawy M, Dyer CR, Strock B (2007) A text-to-picture synthesis system for augmenting communication. In: Proceedings of the 22nd national conference on artificial intelligence, Vancouver, vol 2, pp. 1590-1595 Mihalcea R, Chee WL (2008) Toward communicating simple sentences using pictorial representations. Mach Transl 22:153–173 Dmitry U (2012) A Text-to-picture system for Russian language. In: Proceedings 6th Russian young scientist conference for information retrieval, Yaroslavl, pp 35–44 Duy B, Carlos N, Bruce EB, Qing Z-T (2012) Automated illustration of patients instructions. J Am Med Inform Assoc 2012:1158–1167 Ruan W, Appasani N, Kim K, Vincelli J, Kim H, Lee W (2018) Pictorial visualization of EMR summary interface and medical information extraction of clinical notes. In: IEEE international conference on computational intelligence and virtual environments for measurement systems and applications (CIVEMSA), Ottawa, pp 1–6 Eva H, Mc KP, Tom L, Joan C (2010) NewsViz: emotional visualization of news stories. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text, Stroudsburg, pp 125–130 Aletras N, Stevenson M (2013) Representing topics using images. In: HLT-NAACL Huang C, Li C, Shan M (2013) Automatic generation of visual story for fairy tales with digital narrative. In: Conference on technologies and applications of artificial intelligence Jiang Y, Liu J, Lu H (2014) Chat with illustration. Multimed Syst 22:5–16 Hong R, Zha Z-J, Gao Y, Chua T-S, Wu X (2012) Multimedia encyclopedia construction by mining web knowledge. Signal Process 93:2361–2368 Ganguly D, Calixto I, Jones G (2015) Overview of the automated story illustration task at FIRE 2015. In: Post proceedings of the workshops at the 7th forum for information retrieval evaluation, Gandhinagar, pp 63–66 Boonpa SRS, Charoenporn T (2017) Relationship extraction from Thai children’s tales for generating illustration. In: 2nd international conference on information technology (INCIT), Nakhonpathom, pp 1–5 Karkar AG, Alja’am JM, Mahmood A (2017) Illustrate it! An Arabic multimedia text-to-picture m-learning system. IEEE Access 5:12777–12787 Scribd [Online]. http://www.scribd.com. Accessed 5 Jan 2018 R. Naeem, A. T. Imtiaz, S. Muhammad, A. Nouman, M. Anzar and R. Sohail (2019) Three-dimensional face recognition using variance-based registration and subject-specific descriptors. Int J Adv Robot Syst 16(3):1729881419851716 Naeem R, Imtiaz AT, Muhammad S, Anzar M, Sohail R, Saadat HD, Nouman A, Muhammad U, Mirza JAB, Usman M (2019) Deeply learned pose invariant image analysis with applications in 3D face recognition. Math Prob Eng pp. Article ID 3547416, 21 pages Zafar B, Ashraf R, Ali N, Ahmed M, Jabbar S, Naseer K, Ahmad A, Jeon G (2018) Intelligent image classification-based on spatial weighted histograms of concentric circles. Comput Sci Inf Syst 15(3):615–633 Nouman A, Khalid BB, Robert S, Savvas AC, Zeshan I, Muhammad R, Hafiz AH (2016) A novel image retrieval based on visual words integration of SIFT and SURF. PLoS ONE 11(6):e0157428 Ali N, Zafar B, Iqbal M, Sajid M, Younis M, Dar S et al (2019) Modeling global geometric spatial information for rotation invariant classification of satellite images. PLoS ONE 14(7):e0219833 Kaiming H, Georgia G, Piotr D, Ross G (2017) Mask R-CNN. In: IEEE international conference on computer vision (ICCV), Venice, pp 2980–2988 Medium Data Science [Online]. https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b. Accessed 01 June 2019 Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324 Qaiser T et al (2018) HER2 challenge contest: a detailed assessment of automated HER2 scoring algorithms in whole slide images of breast cancer tissues. Histopathology 72(2):227–238 Qaiser T, Tsang Y, Taniyama D, Sakamoto N, Nakane K, Epstein D, Rajpoot N (2018) Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features. CoRR, vol. abs/1805.03699 Matterport: Mask RCNN [Online]. https://github.com/matterport/Mask_RCNN. Accessed 4 Jan 2019 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Erhan OV, Alexander T, Samy B, Dumitru (2016) Show and tell: lessons learned from the 2015 {MSCOCO} image captioning challenge. CoRR Shallue C (2018) Show and tell: a neural image caption generator [Online]. https://github.com/tensorflow/models/tree/master/research/im2txt. Accessed 01 May 2018 Zakraoui J, Elloumi S, Alja’am J, Yahia S (2019) Improving Arabic text to image mapping using a robust machine learning technique. IEEE Access 7:18772–18782 Developers G: Google custom search [Online]. https://developers.google.com/apis-explorer/#search/customsearch/. Accessed 01 Apr 2018