Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Sử dụng Cơ chế Chú Ý Hướng dẫn trong CNN cho Phát hiện và Nhận diện Người Đi Bộ Bị Che Khuất

Springer Science and Business Media LLC - Tập 129 - Trang 1875-1892 - 2021

Shanshan Zhang¹, Di Chen¹, Jian Yang¹, Bernt Schiele²

¹Nanjing University of Science and Technology, Nanjing, China

²Max Planck Institute for Informatics, Saarbrücken, Germany

Tóm tắt

Phát hiện và nhận diện người đi bộ đã có những tiến bộ đáng kể trong vài năm qua. Tuy nhiên, việc phát hiện và nhận diện những người bị che khuất gặp nhiều khó khăn, do hình dáng của họ thay đổi nhiều tùy thuộc vào nhiều kiểu che khuất khác nhau. Trong bài báo này, chúng tôi đặt mục tiêu đề xuất một phương pháp đơn giản và gọn nhẹ dựa trên mạng nơ-ron tích chập (CNN) để xử lý vấn đề che khuất. Chúng tôi bắt đầu bằng cách giải thích các đặc trưng kênh của CNN trong một bộ phát hiện người đi bộ, và nhận thấy rằng các kênh khác nhau sử dụng phản hồi cho các bộ phận cơ thể khác nhau. Những phát hiện này đã thúc đẩy chúng tôi áp dụng một cơ chế chú ý trên các kênh để đại diện cho nhiều kiểu che khuất khác nhau trong một mô hình duy nhất, bởi vì mỗi kiểu che khuất có thể được công thức hóa dưới dạng một số tổ hợp cụ thể của các bộ phận cơ thể. Do đó, một mạng chú ý với sự hướng dẫn tự động hoặc bên ngoài được đề xuất như một phần bổ sung cho phương pháp CNN cơ bản. Ngoài ra, chúng tôi còn đề xuất một phương pháp học tự lập được hướng dẫn bởi chú ý nhằm cân bằng việc tối ưu hóa trên các mức độ che khuất khác nhau. Phương pháp mà chúng tôi đề xuất cho thấy sự cải thiện đáng kể so với các phương pháp cơ bản cho cả hai nhiệm vụ phát hiện và nhận diện người đi bộ. Đối với phát hiện người đi bộ, chúng tôi đạt được sự cải thiện đáng kể 8 điểm phần trăm so với bộ phát hiện FasterRCNN cơ bản trên tập dữ liệu CityPersons với mức che khuất nặng và trên Caltech, chúng tôi vượt trội hơn phương pháp hiện tại tốt nhất là 5 điểm phần trăm. Đối với nhận diện người đi bộ, phương pháp của chúng tôi vượt qua phương pháp cơ bản và đạt được hiệu suất hàng đầu trên nhiều chuẩn mực nhận diện.

Từ khóa

#phát hiện người đi bộ #nhận diện người đi bộ #che khuất #mạng nơ-ron tích chập #cơ chế chú ý

Tài liệu tham khảo

Ahmed E., Jones M., & Marks T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR. Bau D., Zhou B., Khosla A., Oliva A., & Torralba A. (2017) Network dissection: Quantifying interpretability of deep visual representations. In CVPR Bell S., Zitnick C. L., Bala K., & Girshick R. (2016). Inside outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR Benenson R., Omran M., Hosang J., & Schiele B. (2014). Ten years of pedestrian detection, what have we learned? In ECCV, CVRSUAD workshop. Brazil G., & Liu X. (2019). Pedestrian detection with autoregressive network phases. In CVPR Brazil G., Yin X., & Liu X. (2017). Illuminating pedestrians via simultaneous detection & segmentation. In ICCV. Cai Z., Fan Q., Feris R., & Vasconcelos N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. In ECCV. Cheng D., Gong Y., Zhou S., Wang J., & Zheng N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In CVPR. Chu X., Zheng A., Zhang X., & Sun J. (2020). Detection in crowded scenes: One proposal, multiple predictions. In CVPR. Cordts M., Omran M., Ramos S., Rehfeld T., Enzweiler M., Benenson R., Franke U., Roth S., & Schiele B. (2016) The cityscapes dataset for semantic urban scene understanding. In CVPR. Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(10), Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. PAMI, 34(4), 743–761. Du X., El-Khamy M., Lee J., & Davis L. S. (2016). Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In arXiv. Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR. Ess, A., Leibe, B., Schindler, K., & Gool, L. V. (2008). A mobile vision system for robust multi-person tracking. In CVPR. Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. PAMI, 32(9), 1627–1645. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2017). Do semantic parts emerge in convolutional neural networks? IJCV, 126(5), 476–494. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR. He, L., Liang, J., Li, H., & Sun, Z. (2018). Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach. In CVPR. Hosang, J., Omran, M., Benenson, R., & Schiele, B. (2015). Taking a deeper look at pedestrians. In CVPR. Hu, J., Shen, L., & Sun, G. (2017). Squeeze-and-excitation networks. arXiv. Huang, H., Li, D., Zhang, Z., Chen, X., & Huang, K. (2018) Adversarially occluded samples for person re-identification. In CVPR. Huang, X., Ge, Z., Jie, Z., & Yoshie, O. (2020a). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR. Huang, X., Ge, Z., Jie, Z., & Yoshie1, O. (2020b). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., & Schiele, B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In ECCV. Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In NIPS. Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS. Li, G., Li, J., Zhang, S., & Yang, J. (2020). Learning hierarchical graph for occluded pedestrian detection. In ACM MM. Li, J., Liang, X., Shen, S., Xu, T., & Yan, S. (2016). Scale-aware fast R-CNN for pedestrian detection. arXiv Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR. Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR. Lin, C., Lu, J., Wang, G., & Zhou, J. (2018). Graininess-aware deep feature learning for pedestrian detection. In ECCV. Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7), Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018a). Pose transferrable person re-identification. In CVPR. Liu S., Huang D., & Wang Y. (2019a) Adaptive nms: Refining pedestrian detection in a crowd. In: CVPR Liu W., Liao S., Hu W., Liang X., & Chen X. (2018b) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: ECCV Liu W., Liao S., Ren W., Hu W., & Yu Y. (2019b) High-level semantic feature detection: A new perspective for pedestrian detection. In: CVPR Mathias M., Benenson R., Timofte R., & Van Gool L. (2013) Handling occlusions with franken-classifiers. In: ICCV Newell A., Yang K., & Deng J. (2016) Stacked hourglass networks for human pose estimation. In: ECCV Noh J., Lee S., Kim B., & Kim G. (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: CVPR Ouyang W., & Wang X. (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR Ouyang W., & Wang X. (2013) Joint deep learning for pedestrian detection. In: ICCV Paisitkriangkrai S., Shen C., & van den Hengel A. (2014) Strengthening the effectiveness of pedestrian detection. In: ECCV Pang Y., Xie J., Khan M. H., Anwer R. M., Khan F. S., & Shao L. (2019) Mask-guided attention network for occluded pedestrian detection. In: ICCV Ren S., He K., Girshick R., & Sun J. (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS Ristani E., Solera F., Zou R., Cucchiara R., & Tomasi C. (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: ECCV Saquib Sarfraz M., Schumann A., Eberle A., & Stiefelhagen R. (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: CVPR Shao S., Zhao Z., Li B., Xiao T., Yu G., Zhang X., & Sun J. (2018) Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:180500123 Si J., Zhang H., Li C.-G., Kuen J., Kong X., Kot A. C., & Wang G. (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR Simon M., Rodner E., & Denzler J. (2014) Part detector discovery in deep convolutional neural networks. In: ACCV Song T., L. Sun D. X., Sun H., & Pu S. (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: ECCV Su C., Li J., Zhang S., Xing J., Gao W., & Tian Q. (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV Suh Y., Wang J., Tang S., Mei T., & Mu Lee K. (2018) Part-aligned bilinear representations for person re-identification. In: ECCV Szegedy C., Vanhoucke V., Ioffe S., Shlens J., & Wojna Z. (2016) Rethinking the inception architecture for computer vision. In: CVPR Tian Y., Luo P., Wang X., & Tang X. (2015a) Deep learning strong parts for pedestrian detection. In: ICCV Tian Y., Luo P., Wang X., & Tang X. (2015b) Pedestrian detection aided by deep learning semantic tasks. In: CVPR Varior R. R., Shuai B., Lu J., Xu D., & Wang G. (2016) A Siamese Long Short-Term Memory Architecture for Human Re-Identification. In: ECCV Wang S., Cheng J., Liu H., & Tang M. (2017) Pcn: Part and context information for pedestrian detection with cnns. In: BMVC Wang X., Xiao T., Jiang Y., Shao S., Sun J., & Shen C. (2018) Repulsion loss: Detecting pedestrians in a crowd. In: CVPR Wei Liu W. R. W. H. Y. Y. Shengcai Liao (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Wu J., Zhou C., Yang M., Zhang Q., Li Y., & Yuan J. (2020) Temporal-context enhanced detection of heavily occluded pedestrians. In: CVPR Xiao T., Li H., Ouyang W., & Wang X. (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR Xiao T., Li S., Wang B., Lin L., & Wang X. (2017) Joint detection and identification feature learning for person search. In: CVPR Xie J., Cholakkal H., Anwer R., Khan F., Pang Y., Shao L., & Shah M. (2020) Count- and similarity-aware r-cnn for pedestrian detection. In: ECCV Xu J., Zhao R., Zhu F., Wang H., & Ouyang W. (2018) Attention-aware compositional network for person re-identification. In: CVPR Yi D., Lei Z., Liao S., & Li S. Z. (2014) Deep metric learning for person re-identification. In: ICPR Zeiler M. D., & Fergus R. (2014) Visualizing and understanding convolutional networks. In: ECCV Zhang L., Lin L., Liang X., & He K. (2016a) Is faster rcnn doing well with pedestrian detection. In: ECCV Zhang S., Benenson R., Omran M., Hosang J., & Schiele B. (2016b) How far are we from solving pedestrian detection? In: CVPR Zhang S., Benenson R., & Schiele B. (2017) Citypersons: A diverse dataset for pedestrian detection. In: CVPR Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018a). Towards reaching human performance in pedestrian detection. PAMI, 40(4), 973–986. Zhang S., Wen L., Bian X., & Lei Z., Li S. Z. (2018b) Occlusion-aware r-cnn: Detecting pedestrians in a crowd. In: ECCV Zheng L., Shen L., Tian L., Wang S., Wang J., & Tian Q. (2015a) Scalable person re-identification: A benchmark. In: ICCV Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., & Tian Q. (2016a) Mars: A video benchmark for large-scale person re-identification. In: ECCV Zheng L., Yang Y., & Hauptmann A. G. (2016b) Person re-identification: Past, present and future. arXiv Zheng L., Zhang H., Sun S., Chandraker M., Yang Y., & Tian Q. (2017a) Person re-identification in the wild. In: CVPR Zheng W. S., Gong S., & Xiang T. (2009) Associating groups of people. In: BMVC Zheng W. S., Li X., Xiang T., Liao S., Lai J., & Gong S. (2015b) Partial person re-identification. In: ICCV Zheng Z., Zheng L., & Yang Y. (2017b) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: ICCV Zheng Z., Zheng L., & Yang Y. (2018) A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(1) Zhong Z., Zheng L., Cao D., & Li S. (2017a) Re-ranking person re-identification with k-reciprocal encoding. In: CVPR Zhong Z., Zheng L., Kang G., Li S., & Yang Y. (2017b) Random erasing data augmentation. In: arxiv Zhou C., & Yuan J. (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV Zhou C., & Yuan J. (2018) Bi-box regression for pedestrian detection and occlusion estimation. In: ECCV Zhou C., Yang M., & Yuan J. (2019) Discriminative feature transformation for occluded pedestrian detection. In: ICCV

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA