FRSFN: Mạng lưới tổng hợp ngữ nghĩa cho việc truy xuất thời trang thực tiễn

Multimedia Tools and Applications - Tập 80 - Trang 17169-17181 - 2020
An-An Liu1, Ting Zhang1, Dan Song1, Wenhui Li1, Ming Zhou2
1Multimedia Institute, School of Electrical and Information Engineering, Tianjin University, Tianjin, China
2Chengdu Spaceon Measurement & Control Technology Co., Ltd, Chengdu, China

Tóm tắt

Trong những năm gần đây, nghiên cứu liên quan đến thời trang đã có những bước tiến đáng kể, và việc sử dụng nội dung hình ảnh cho việc truy xuất thời trang đã trở thành một trong những phương pháp hiệu quả cũng như điểm nóng nghiên cứu. Tuy nhiên, đây vẫn là một nhiệm vụ khó khăn do sự đa dạng của nội dung trong các hình ảnh thời trang. Công trình này trình bày một phương pháp truy xuất thời trang thực tiễn nhấn mạnh vào các sản phẩm cụ thể. Mạng lưới Tổng hợp Ngữ nghĩa của phương pháp này đầu tiên trích xuất hai loại đặc trưng, bao gồm các đặc trưng toàn cục từ hình ảnh truy vấn ban đầu và các đặc trưng của sản phẩm. Các đặc trưng của sản phẩm được lấy từ cùng một hình ảnh truy vấn đã được phân tích ngữ nghĩa trước đó. Sau đó, mạng lưới kết hợp hai loại đặc trưng với sự kết hợp của thông tin màu sắc. Cuối cùng, các điểm số tương đồng được tính toán giữa các đặc trưng để thực hiện việc truy xuất. Các thí nghiệm cho thấy rằng trong khi giữ được kết quả truy xuất thống kê cao hơn, phương pháp của chúng tôi nắm bắt được các đặc điểm chi tiết và các sản phẩm của trang phục và giữ được sự tương đồng tổng thể thỏa mãn về hình dáng và màu sắc.

Từ khóa

#truy xuất thời trang #mạng lưới tổng hợp ngữ nghĩa #đặc trưng sản phẩm #thông tin màu sắc #hình ảnh thời trang

Tài liệu tham khảo

Andriluka M, Pishchulin L, Gehler P (2014) Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp 3686–3693 Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 Chen H, Gallagher A, Girod B (2012) Describing clothing by semantic attributes. In: European Conference on Computer Vision, Springer, pp 609–623 Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062 Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3640–3649 Cheng Z, Chang X, Zhu L, Kanjirathinkal RC, Kankanhalli M (2019) Mmalfm: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems (TOIS) 37(2):16 Corbiere C, Ben-Younes H, Ramé A., Ollion C (2017) Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2268–2274 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: Fine-grained clothing style detection and retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 8–13 Fang HS, Lu G, Fang X, Xie J, Tai YW, Lu C (2018) Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, pp 70–78 Gajic B, Baldrich R (2018) Cross-domain fashion image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1869–1871 Gan C, Lin M, Yang Y, De Melo G, Hauptmann AG (2016) Concepts not alone: Exploring pairwise relationships for zero-shot video activity recognition. In: Thirtieth AAAI Conference on Artificial Intelligence Gong K, Liang X, Li Y, Chen Y, Yang M, Lin L (2018) Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 770–785 Hadi Kiapour M, Han X, Lazebnik S, Berg AC, Berg TL (2015) Where to buy it: Matching street clothing photos in online shops. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3343–3351 Han X, Song X, Yao Y, Xu XS, Nie L (2019) Neural compatibility modeling with probabilistic knowledge distillation. IEEE Trans Image Process 29:871–882 Han Y, Zhu L, Cheng Z, Li J, Liu X Discrete optimal graph clustering. IEEE Transactions on Cybernetics, pp 1–14 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1062–1070 Kalayeh MM, Basaran E, Gökmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1062–1071 Liang X, Gong K, Shen X, Lin L (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885 Liang X, Lin L, Yang W, Luo P, Huang J, Yan S (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimedia 18(6):1175–1186 Liang X, Liu S, Shen X, Yang J, Liu L, Dong J, Lin L, Yan S (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37(12):2402–2414 Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contextualized convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1386–1394 Lin K, Yang HF, Liu KH, Hsiao JH, Chen CS (2015) Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM, pp 499–502 Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116 Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Cao X, Yan S (2015) Matching-cnn meets knn: Quasi-parametric human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1419–1427 Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1096–1104 Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 Luo Z, Yuan J, Yang J, Wen W (2019) Spatial constraint multiple granularity attention network for clothesretrieval. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 859–863 Nie W, Wang K, Wang H, Su Y (2019) The assessment of 3d model representation for retrieval with cnn-rnn networks. Multimedia Tools Appl 78 (12):16979–16994 Nie W, Wang W, Huang X (2019) Srnet: Structured relevance feature learning network from skeleton data for human action recognition. IEEE Access 7:132161–132172 Nie X, Feng J, Yan S (2018) Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 502–517 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science Song X, Feng F, Liu J, Li Z, Nie L, Ma J (2017) Neurostylist: Neural compatibility modeling for clothing matching. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 753–761 Sun X, Liu Z, Hu Y, Zhang L, Zimmermann R (2018) Perceptual multi-channel visual feature fusion for scene categorization. Inf Sci 429:37–48 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9 Xia F, Zhu J, Wang P, Yuille AL (2016) Pose-guided human parsing by an and/or graph using pose-context features. In: Thirtieth AAAI Conference on Artificial Intelligence Xie H, Fang S, Zha ZJ, Yang Y, Li Y, Zhang Y (2019) Convolutional attention networks for scene text recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15(1s):3 Yamaguchi K, Hadi Kiapour M, Berg TL (2013) Paper doll parsing: Retrieving similar styles to parse clothing items. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3519–3526 Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 3570–3577 Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural computing and applications, pp 1–20 Zhang H, Li S, Cai S, Jiang H, Kuo CCJ (2018) Representative fashion feature extraction by leveraging weakly annotated online resources. In: 2018 25Th IEEE International Conference on Image Processing (ICIP), IEEE, pp 2640–2644 Zhao B, Feng J, Wu X, Yan S (2017) Memory-augmented attribute manipulation networks for interactive fashion search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1520–1528 Ziaeefard M, Camacaro J, Bessega C (2018) Hierarchical feature map characterization in fashion interpretation. In: 2018 15Th Conference on Computer and Robot Vision (CRV), IEEE, pp 88–94