Efficient lightweight video person re-identification with online difference discrimination module
Tóm tắt
Video person re-identification (video Re-ID) is a key technology applied to video surveillance and security. Typical person re-identification is designed to retrieve the correct match of the target image (query) from gallery images, while video Re-ID extends this to query from gallery videos. The main factors affecting the video Re-ID model are: (i) a high-quality frame-level feature extractor, and (ii) temporal modeling that combines frame-level features into a feature for retrieval. In this work, we use ShuffleNet V2-based lightweight algorithm for video Re-ID, which can meet the demand for practical application and solve the problem of high consumption for computing resources, and maintain high performance. At the same time, the lightweight space attention mechanism Spatial Group-wise Enhance (SGE) module is used to view the person in more detail, which makes the feature representation more compact and effectively improves the retrieval accuracy. Finally, we design an Online Difference Discrimination (ODD) module to measure the feature gap between video frames, and use this module to make different temporal modeling for different quality video sequences. Experiments on three datasets (i.e., iLIDS-VID, PRID2011 and MARS) show that our method is competitive with state-of-the-art methods.
Tài liệu tham khảo
Ahmed S, Dogra DP, Choi H, Chae S, Kim IJ et al (2019) Person re-identification in videos by analyzing spatio-temporal tubes. arXiv:1902.04856
Chen D, Hua G, Wen F, Sun J (2016) Supervised transformer network for efficient face detection. In: European conference on computer vision. Springer, pp 122–138
Chen Y, Liu L, Tao J, Xia R, Chen X (2020) The improved image inpainting algorithm via encoder and similarity constraint. Vis Comput, https://doi.org/10.1007/s00371-020-01932-3
Chen Y, Wang J, Xia R, Zhang Q, Cao Z, Yang K (2019) The visual object tracking algorithm research based on adaptive combination kernel. J Ambient Intell Humanized Comput 10(12):4855–4867
Dai J, Zhang P, Wang D, Lu H, Wang H (2018) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). IEEE, vol 2, pp 1735–1742
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis. Springer, pp 91–102
Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li S, Bak S, Carr P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 369–378
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393
Liao X, He L, Yang Z, Zhang C (2018) Video-based person re-identification via 3d convolutional networks and non-local attention. In: Asian conference on computer vision. Springer, pp 620–634
Liao S, Hu Y, Zhu X, Li S (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206
Lisanti G, Masi I, Del Bimbo A (2014) Matching people across camera views using kernel canonical correlation analysis. In: Proceedings of the international conference on distributed smart cameras. ACM, pp 10
Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5790–5799
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the european conference on computer vision (ECCV), pp 116–131
McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334
Navaneet K, Todi V, Babu RV, Chakraborty A (2019) All for one: Frame-wise rank loss for improving video-based person re-identification. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2472–2476
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Si J, Zhang H, Li CG, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5363–5372
Song G, Leng B, Liu Y, Hetang C, Cai S (2018) Region-based quality estimation network for large-scale person re-identification. In: Thirty-second AAAI conference on artificial intelligence
Su X, Zou Y, Cheng Y, Xu S, Yu M, Zhou P (2018) Spatial-temporal synergic residual learning for video person re-identification. arXiv:1807.05799
Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: European conference on computer vision. Springer, pp 135–153
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision. Springer, pp 688–703
Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: European conference on computer vision. Springer, pp 1–16
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742
Zakria, Cai J, Deng J, Aftab MU, Kumar R (2019) Efficient and deep vehicle re-identification using multi-level feature extraction. Appl Sci 9 (7):1291
Zhang J, Wang N, Zhang L (2018) Multi-shot pedestrian re-identification via sequential decision making. In: Proceedings of the IEEE conferences on computer vision and pattern recognition, pp 6781–6789
Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1239–1248
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhao Y, Shen X, Jin Z, Lu H, Hua X.s (2019) Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4913–4922
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European conference on computer vision. Springer, pp 868–884
Zheng Z, Zheng L, Yang Y (2018) Pedestrian alignment network for large-scale person re-identification. IEEE Trans Circ Syst Video Technol 29(10):3037–3045
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1318–1327
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4747–4756
Zhou Q, Zhong B, Lan X, Sun G, Ji R (2020) Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans Image Process 29:1–1
Zhou Q, Zhong B, Zhang Y, Li J, Fu Y (2018) Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans Multimed PP:1–1