Efficient lightweight video person re-identification with online difference discrimination module

Multimedia Tools and Applications - Tập 81 - Trang 19169-19181 - 2021
Cunyuan Gao1, Rui Yao1,2, Yong Zhou1, Jiaqi Zhao1, Liang Fang1, Fuyuan Hu2
1School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
2The Suzhou Smart City Research Institute, Suzhou University of Science and Technology, Suzhou, China

Tóm tắt

Video person re-identification (video Re-ID) is a key technology applied to video surveillance and security. Typical person re-identification is designed to retrieve the correct match of the target image (query) from gallery images, while video Re-ID extends this to query from gallery videos. The main factors affecting the video Re-ID model are: (i) a high-quality frame-level feature extractor, and (ii) temporal modeling that combines frame-level features into a feature for retrieval. In this work, we use ShuffleNet V2-based lightweight algorithm for video Re-ID, which can meet the demand for practical application and solve the problem of high consumption for computing resources, and maintain high performance. At the same time, the lightweight space attention mechanism Spatial Group-wise Enhance (SGE) module is used to view the person in more detail, which makes the feature representation more compact and effectively improves the retrieval accuracy. Finally, we design an Online Difference Discrimination (ODD) module to measure the feature gap between video frames, and use this module to make different temporal modeling for different quality video sequences. Experiments on three datasets (i.e., iLIDS-VID, PRID2011 and MARS) show that our method is competitive with state-of-the-art methods.

Tài liệu tham khảo

Ahmed S, Dogra DP, Choi H, Chae S, Kim IJ et al (2019) Person re-identification in videos by analyzing spatio-temporal tubes. arXiv:1902.04856 Chen D, Hua G, Wen F, Sun J (2016) Supervised transformer network for efficient face detection. In: European conference on computer vision. Springer, pp 122–138 Chen Y, Liu L, Tao J, Xia R, Chen X (2020) The improved image inpainting algorithm via encoder and similarity constraint. Vis Comput, https://doi.org/10.1007/s00371-020-01932-3 Chen Y, Wang J, Xia R, Zhang Q, Cao Z, Yang K (2019) The visual object tracking algorithm research based on adaptive combination kernel. J Ambient Intell Humanized Comput 10(12):4855–4867 Dai J, Zhang P, Wang D, Lu H, Wang H (2018) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377 Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). IEEE, vol 2, pp 1735–1742 Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737 Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis. Springer, pp 91–102 Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Li S, Bak S, Carr P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 369–378 Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393 Liao X, He L, Yang Z, Zhang C (2018) Video-based person re-identification via 3d convolutional networks and non-local attention. In: Asian conference on computer vision. Springer, pp 620–634 Liao S, Hu Y, Zhu X, Li S (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206 Lisanti G, Masi I, Del Bimbo A (2014) Matching people across camera views using kernel canonical correlation analysis. In: Proceedings of the international conference on distributed smart cameras. ACM, pp 10 Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5790–5799 Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the european conference on computer vision (ECCV), pp 116–131 McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334 Navaneet K, Todi V, Babu RV, Chakraborty A (2019) All for one: Frame-wise rank loss for improving video-based person re-identification. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2472–2476 Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520 Si J, Zhang H, Li CG, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5363–5372 Song G, Leng B, Liu Y, Hetang C, Cai S (2018) Region-based quality estimation network for large-scale person re-identification. In: Thirty-second AAAI conference on artificial intelligence Su X, Zou Y, Cheng Y, Xu S, Yu M, Zhou P (2018) Spatial-temporal synergic residual learning for video person re-identification. arXiv:1807.05799 Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: European conference on computer vision. Springer, pp 135–153 Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision. Springer, pp 688–703 Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: European conference on computer vision. Springer, pp 1–16 Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742 Zakria, Cai J, Deng J, Aftab MU, Kumar R (2019) Efficient and deep vehicle re-identification using multi-level feature extraction. Appl Sci 9 (7):1291 Zhang J, Wang N, Zhang L (2018) Multi-shot pedestrian re-identification via sequential decision making. In: Proceedings of the IEEE conferences on computer vision and pattern recognition, pp 6781–6789 Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1239–1248 Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856 Zhao Y, Shen X, Jin Z, Lu H, Hua X.s (2019) Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4913–4922 Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European conference on computer vision. Springer, pp 868–884 Zheng Z, Zheng L, Yang Y (2018) Pedestrian alignment network for large-scale person re-identification. IEEE Trans Circ Syst Video Technol 29(10):3037–3045 Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1318–1327 Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4747–4756 Zhou Q, Zhong B, Lan X, Sun G, Ji R (2020) Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans Image Process 29:1–1 Zhou Q, Zhong B, Zhang Y, Li J, Fu Y (2018) Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans Multimed PP:1–1