Multi-camera person re-identification using spatiotemporal context modeling

Neural Computing and Applications - Tập 35 - Trang 20117-20142 - 2023
Fatima Zulfiqar1, Usama Ijaz Bajwa1, Rana Hammad Raza2
1Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
2Pakistan Navy Engineering College, National University of Sciences and Technology (NUST), Karachi, Pakistan

Tóm tắt

Person re-identification (ReID) aims at identifying a person of interest (POI) across multiple non-overlapping cameras. The POI can be either in an image or in a video sequence. Factors such as occlusion, variable viewpoint, misalignment, unrestrained poses, background clutter are the major challenges in developing robust, person ReID models. To address these issues, an attention mechanism that comprises local part/region-aggregated feature representation learning is presented in this paper by incorporating long-range local and global context modeling. The part-aware local attention blocks are aggregated into the widely used modified pre-trained ResNet50 CNN architecture as a backbone employing two attention blocks, i.e., Spatio-Temporal Attention Module (STAM) and Channel Attention Module (CAM). The spatial attention block of STAM can learn contextual dependencies between different human body parts/regions like head, upper body, lower body, and shoes from a single frame. On the other hand, the temporal attention modality can learn temporal contextual dependencies of the same person’s body parts across all video frames. Lastly, the channel-based attention modality, i.e., CAM, can model semantic connections between the channels of feature maps. These STAM and CAM blocks are combined sequentially to form a unified attention network named as Spatio-Temporal Channel Attention Network (STCANet) that will be able to learn both short-range and long-range global feature maps, respectively. Extensive experiments are carried out to study the effectiveness of STCANet on three image-based and two video-based benchmark datasets, i.e., Market-1501, DukeMTMC-ReID, MSMT17, DukeMTC-VideoReID, and MARS. K-reciprocal re-ranking of gallery set is also applied in which the proposed network showed a significant improvement over these datasets in comparison with state of the art. Lastly, to study the generalizability of STCANet on unseen test instances, cross-validation on external cohorts is also applied that showed the robustness of the proposed model that can be easily deployed to the real world for practical applications.

Tài liệu tham khảo

Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 79–88 Zheng L, Bie Z, Sun Y, Wang J, Chi S, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 868–884 Porikli F (2003) Inter-camera color calibration by correlation model function. In: Proceedings 2003 international conference on image processing (cat. No. 03CH37429). 2. IEEE Hirzer M, Roth PM, Köstinger M, Bischof H (2012) Relaxed pairwise learned metric for person re-identification. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) European conference on computer vision. Springer, Berlin, Heidelberg, pp 780–793 Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 2288–2295. IEEE Ye M, Liang C, Yu Y, Wang Z, Leng Q, Xiao C, Chen J, Hu R (2016) Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans Multimedia 18(12):2553–2566 Wang G, Lai J, Huang P, Xie X (2019) Spatial-temporal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 33(01): 8933-8940 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE Wu L, Wang Y, Shao L, Wang M (2019) 3-D PersonVLAD: Learning deep global representations for video-based person reidentification. IEEE Trans Neural Netw Learn Syst 30(11):3347–3359 McLaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 701–716 Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2020) IAUnet: global context-aware feature learning for person re-identification. arXiv. arXiv, doi: https://doi.org/10.1109/tnnls.2020.3017939. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person re-identification: a benchmark University of Texas at San Antonio,” Iccv, pp. 1116–1124 [Online]. Available: http://www.liangzheng.com.cn Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE international conference on computer vision, pp. 3754–3762 Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5177–5186 Wieczorek M, Rychalska B, Dąbrowski J (2021) On the unreasonable effectiveness of centroids in image retrieval. In: International conference on neural information processing. Springer, Cham Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. [Online]. Available: https://github.com/michuanhaohao/reid-strong-baseline Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2020) Deep learning for person re-identification: a survey and outlook. [Online]. Available: http://arxiv.org/abs/2001.04193 Neff C, Mendieta M, Mohan S, Baharani M, Rogers S, Tabkhi H (2020) REVAMP2T: real-time edge video analytics for multicamera privacy-aware pedestrian tracking. IEEE Internet Things J 7(4):2591–2602. https://doi.org/10.1109/JIOT.2019.2954804 Zhou N-R, Zhang T-F, Xie X-W, Jun-Yun Wu (2023) Hybrid quantum–classical generative adversarial networks for image generation via learning discrete distribution. Signal Process: Image Commun 110:116891 Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S, Gu J (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimedia 22(10):2597–2609 Zeng P, Tong L, Liang Y, Zhou N, Jianhua Wu (2022) Multitask image splicing tampering detection based on attention mechanism. Mathematics 10(20):3852 Li X, Liu Y, Wang K, Yan Y, Wang F-Y (2019) A hybrid of hard and soft attention for person re-identification. In: 2019 Chinese automation congress (CAC), pp. 2433–2438. IEEE Somers V, De Vleeschouwer C, Alahi A. Body part-based representation learning for occluded person re-identification. arXiv preprint arXiv:2211.03679 (2022) Gao G et al. (2022) AONet: attentional occlusion-aware network for occluded person re-identification. In: Proceedings of the Asian conference on computer vision Chen Y et al (2022) Pose-guided counterfactual inference for occluded person re-identification. Image Vis Comput 128:104587 Xia BN et al. (2019) Second-order non-local attention networks for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision Sun Y et al. (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV) Chen T et al. (2019) Abd-net: attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184 Ren M, He L, Liao X, Liu W, Wang Y, Tan T (2021) Learning instance-level spatial-temporal patterns for person re-identification. pp. 14930–14939, [Online]. Available: http://arxiv.org/abs/2108.00171. Munir, A, Martinel N, Micheloni C (2021) Self and channel attention network for person re-identification. In: 2020 25th international conference on pattern recognition (ICPR). IEEE Han K et al (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919 Bai Y, Mei J, Yuille A, Xie C (2021) “Are transformers more robust than CNNs?” No NeurIPS, pp. 1–13, [Online]. Available: http://arxiv.org/abs/2111.05464. He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 15013–15022 Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2898–2907. Jin K, Zhai J, Gao Y (2023) TwinsReID: person re-identification based on twins transformer’s multi-level features. Math Biosci Eng 20(2):2110–2130 Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1318–1327. Bai S, Bai X, Tian Q (2017) Scalable person re-identification on supervised smoothed manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Wang J, Zhou S, Wang J, Hou Q (2018) Deep ranking model by large adaptive margin learning for person re-identification. Pattern Recogn 74:241–252 Wu G, Zhu X, Gong S (2022) Learning hybrid ranking representation for person re-identification. Pattern Recogn 121:108239 Song W, Wu Y, Zheng J, Chen C, Liu F (2019) Extended global-local representation learning for video person re-identification. IEEE Access 7:122684–122696. https://doi.org/10.1109/ACCESS.2019.2937974 Eom C, Lee G, Lee J, Ham B (2021) Video-based person re-identification with spatial and temporal memory networks. pp. 12036–12045. [Online]. Available: http://arxiv.org/abs/2108.09039 Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 228–243 Rahman T, Rochan M, Wang Y (2019) Video-based person re-identification using refined attention networks. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 274–282 Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) Vrstc: occlusion-free video person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192. Wang Y, Zhang P, Gao S, Geng X, Lu H, Wang D (2021) Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 12026–12035. Wu Di, Wang C, Yong Wu, Wang Q-C, Huang D-S (2021) Attention deep model with multi-scale deep supervision for person re-identification. IEEE Trans Emerg Topics Comput Intell 5(1):70–78 Ning, J, Li F, Liu R, Takeuchi S, Suzuki G (2022) Temporal extension topology learning for video-based person re-identification. In: Proceedings of the Asian conference on computer vision, pp. 207–219. Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5735–5744. Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 135–153 Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890. Liang X, Gong Ke, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885 Zhang, S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems. 31 Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Cambridge Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. [Online]. Available: http://arxiv.org/abs/1703.07737. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 13001-13008 Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Zhong Z, Zheng L, Zheng Z, Li S, Yang Yi (2018) Camstyle: a novel data augmentation method for person re-identification. IEEE Trans Image Process 28(3):1176–1190 Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang YG, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European conference on computer vision (ECCV). pp. 650–667 Adil M, Mamoon S, Zakir A, Manzoor MA, Lian Z (2020) Multi scale-adaptive super-resolution person re-identification using GAN. IEEE Access 8:177351–177362. https://doi.org/10.1109/access.2020.3023594 Zheng F, Deng C, Sun X, Jiang X, Guo X, Yu Z, Huang F, Ji R (2019) Pyramidal person re-identification via multi-loss dynamic training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8514–8522. Dai Z, Chen M, Gu X, Zhu S, Tan P (2019) Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3691–3701 Zhong S, Bao Z, Gong S, Xia K (2021) Person reidentification based on pose-invariant feature and B-KNN reranking. IEEE Transactions on Comput Soc Syst 8(5):1272–1281 Zhu X, Zhu X, Li M, Morerio P, Murino V, Gong S (2021) Intra-camera supervised person re-identification. Int J Comput Vision 129(5):1580–1595 Zhihui Z, Xinyang J, Feng Z, Xiaowei G, Feiyue H, Weishi Z, Xing S (2019) Viewpoint-aware loss with angular regularization for person re-identification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA. 27 Park H, Ham B (2020) Relation network for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 11839-11847. Tan H, Xiao H, Zhang X, Dai B, Shiming Lai Y, Liu MZ (2020) Msba: multiple scales, branches and attention network with bag of tricks for person re-identification. IEEE Access 8:63632–63642 Aich A, Zheng M, Karanam S, Chen T, Roy-Chowdhury AK, Wu Z (2021) Spatio-temporal representation factorization for video-based person re-identification. pp. 152–162. [Online]. Available: http://arxiv.org/abs/2107.11878. Hou R, Chang H, Ma B, Shan S, Chen X (2020) Temporal complementary learning for video person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 388–405 Sun R, Huang Q, Xia M, Zhang J (2018) Video-based person re-identification by an end-to-end learning architecture with hybrid deep appearance-temporal feature. Sensors 18(11):3669 Li P, Pan P, Liu P, Xu M, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circuits Syst Video Technol 31(2):503–511. https://doi.org/10.1109/TCSVT.2020.2988034 Bai S, Bai X (2016) Sparse contextual activation for efficient visual re-ranking. IEEE Trans Image Process 25(3):1056–1069 Liu Y, Lin S, Andy S (2018) Adaptive re-ranking of deep feature for person re-identification. arXiv preprint arXiv:1811.08561. Saquib SM, Schumann A, Eberle A, Stiefelhagen R (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 420–429. Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653