SSMTL++: Revisiting self-supervised multi-task learning for video anomaly detection

Computer Vision and Image Understanding - Tập 229 - Trang 103656 - 2023
Antonio Barbalau1, Radu Tudor Ionescu1,2,3, Mariana-Iuliana Georgescu1,2, Jacob Dueholm4,5, Bharathkumar Ramachandra6, Kamal Nasrollahi4,5, Fahad Shahbaz Khan3,7, Thomas B. Moeslund4, Mubarak Shah8
1Department of Computer Science, University of Bucharest, 14 Academiei Street, Bucharest 010014, Romania
2SecurifAI, 21D Mircea Voda, Bucharest 030662, Romania
3MBZ University of Artificial Intelligence, Masdar City, Abu Dhabi, United Arab Emirates
4Department of Architecture, Design, and Media Technology, Aalborg University, Rendsburggade 14, Aalborg 9000, Denmark
5Milestone Systems, Banemarksvej 50C, Brøndby 2605, Denmark
6Geopipe Inc, 460W 51st, New York City 10019, NY, USA
7Linköping University, 581 83 Linköping. Sweden
8Center for Research in Computer Vision (CRCV), University of Central Florida, Orlando 32816, FL, USA

Tài liệu tham khảo

Acsintoae, A., Florescu, A., Georgescu, M., Mare, T., Sumedrea, P., Ionescu, R.T., Khan, F.S., Shah, M., 2022. UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection. In: Proceedings of CVPR. pp. 20143–20153. Antic, B., Ommer, B., 2011. Video parsing for abnormality detection. In: Proceedings of ICCV. pp. 2415–2422. Artacho, B., Savakis, A., 2020. UniPose: Unified Human Pose Estimation in Single Images and Videos. In: Proceedings of CVPR. pp. 7035–7044. Astrid, M., Zaheer, M.Z., Lee, S.-I., 2021a. Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection. In: Proceedings of ICCVW. pp. 207–214. Astrid, M., Zaheer, M.Z., Lee, J., Lee, S., 2021b. Learning Not to Reconstruct Anomalies. In: Proceedings of BMVC. Bertasius, G., Wang, H., Torresani, L., 2021. Is Space-Time Attention All You Need for Video Understanding?. In: Proceedings of ICML. pp. 813–824. Chang, 2022, Video anomaly detection with spatio-temporal dissociation, Pattern Recognit., 122, 10.1016/j.patcog.2021.108213 Cheng, K.-W., Chen, Y.-T., Fang, W.-H., 2015. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In: Proceedings of CVPR. pp. 2909–2917. Cong, Y., Yuan, J., Liu, J., 2011. Sparse reconstruction cost for abnormal event detection. In: Proceedings of CVPR. pp. 3449–3456. Del Giorno, A., Bagnell, J., Hebert, M., 2016. A Discriminative Framework for Anomaly Detection in Large Videos. In: Proceedings of ECCV. pp. 334–349. Dong, 2020, Dual discriminator generative adversarial network for video anomaly detection, IEEE Access, 8, 88170, 10.1109/ACCESS.2020.2993373 Doshi, K., Yilmaz, Y., 2020a. Any-Shot Sequential Anomaly Detection in Surveillance Videos. In: Proceedings of CVPRW. pp. 934–935. Doshi, K., Yilmaz, Y., 2020b. Continual Learning for Anomaly Detection in Surveillance Videos. In: Proceedings of CVPRW. pp. 254–255. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al., 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: Proceedings of ICLR. Dutta, J.K., Banerjee, B., 2015. Online Detection of Abnormal Events Using Incremental Coding Length. In: Proceedings of AAAI. pp. 3755–3761. Feng, J.-C., Hong, F.-T., Zheng, W.-S., 2021. MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection. In: Proceedings of CVPR. pp. 14009–14018. Feng, 2017, Learning deep event models for crowd anomaly detection, Neurocomputing, 219, 548, 10.1016/j.neucom.2016.09.063 Georgescu, M.-I., Barbalau, A., Ionescu, R.T., Khan, F.S., Popescu, M., Shah, M., 2021. Anomaly Detection in Video via Self-Supervised and Multi-Task Learning. In: Proceedings of CVPR. pp. 12742–12752. Georgescu, 2022, A background-agnostic framework with adversarial training for abnormal event detection in video, IEEE Trans. Pattern Anal. Mach. Intell., 44, 4505 Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Van Den Hengel, A., 2019. Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. In: Proceedings of ICCV. pp. 1705–1714. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S., 2016. Learning Temporal Regularity in Video Sequences. In: Proceedings of CVPR. pp. 733–742. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R., 2022. Masked autoencoders are scalable vision learners. In: Proceedings of CVPR. pp. 16000–16009. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R., 2020. Momentum contrast for unsupervised visual representation learning. In: Proceedings of CVPR. pp. 9729–9738. He, K., Gkioxari, G., Dollar, P., Girshick, R., 2017. Mask R-CNN. In: Proceedings of ICCV. pp. 2980–2988. He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep Residual Learning for Image Recognition. In: Proceedings of CVPR. pp. 770–778. Hinami, R., Mei, T., Satoh, S., 2017. Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge. In: Proceedings of ICCV. pp. 3639–3647. Huang, 2022, Self-supervised attentive generative adversarial networks for video anomaly detection, IEEE Trans. Neural Netw. Learn. Syst., 1 Huang, 2022, Abnormal event detection using deep contrastive learning for intelligent video surveillance system, IEEE Trans. Ind. Inform., 18, 5171, 10.1109/TII.2021.3122801 Huang, 2022, Self-supervision-augmented deep autoencoder for unsupervised visual anomaly detection, IEEE Trans. Cybern., 52, 13834, 10.1109/TCYB.2021.3127716 Ionescu, R.T., Khan, F.S., Georgescu, M.-I., Shao, L., 2019a. Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video. In: Proceedings of CVPR. pp. 7842–7851. Ionescu, R.T., Smeureanu, S., Alexe, B., Popescu, M., 2017. Unmasking the abnormal events in video. In: Proceedings of ICCV. pp. 2895–2903. Ionescu, R.T., Smeureanu, S., Popescu, M., Alexe, B., 2019b. Detecting abnormal events in video using Narrowed Normality Clusters. In: Proceedings of WACV. pp. 1951–1960. Ji, X., Li, B., Zhu, Y., 2020. TAM-Net: Temporal Enhanced Appearance-to-Motion Generative Network for Video Anomaly Detection. In: Proceedings of IJCNN. pp. 1–8. Jocher, 2022 Kim, J., Grauman, K., 2009. Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates. In: Proceedings of CVPR. pp. 2921–2928. Kingma, D.P., Ba, J., 2015. Adam: A method for stochastic optimization. In: Proceedings of ICLR. Lazebnik, 2005, A sparse texture representation using Local Affine Regions, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1265, 10.1109/TPAMI.2005.151 Lee, 2019, BMAN: Bidirectional multi-scale aggregation networks for abnormal event detection, IEEE Trans. Image Process., 29, 2395, 10.1109/TIP.2019.2948286 Li, 2021, Decoupled appearance and motion learning for efficient anomaly detection in surveillance video, Comput. Vis. Image Underst., 210, 10.1016/j.cviu.2021.103249 Li, 2014, Anomaly detection and localization in crowded scenes, IEEE Trans. Pattern Anal. Mach. Intell., 36, 18, 10.1109/TPAMI.2013.111 Lin, X., Chen, Y., Li, G., Yu, Y., 2022. A Causal Inference Look at Unsupervised Video Anomaly Detection. In: Proceedings of AAAI. pp. 1620–1629. Liu, Y., Li, C.-L., Póczos, B., 2018a. Classifier Two-Sample Test for Video Anomaly Detections. In: Proceedings of BMVC. Liu, W., Luo, W., Lian, D., Gao, S., 2018b. Future Frame Prediction for Anomaly Detection – A New Baseline. In: Proceedings of CVPR. pp. 6536–6545. Liu, P., Lyu, M.R., King, I., Xu, J., 2019. SelFlow: Self-Supervised Learning of Optical Flow. In: Proceedings of CVPR. pp. 4571–4580. Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G., 2021. A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction. In: Proceedings of ICCV. pp. 13588–13597. Lu, C., Shi, J., Jia, J., 2013. Abnormal Event Detection at 150 FPS in MATLAB. In: Proceedings of ICCV. pp. 2720–2727. Lu, Y., Yu, F., Kumar, M., Reddy, K., Wang, Y., 2020. Few-Shot Scene-Adaptive Anomaly Detection. In: Proceedings of ECCV. pp. 125–141. Luo, W., Liu, W., Gao, S., 2017. A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework. In: Proceedings of ICCV. pp. 341–349. Luo, 2022, Future frame prediction network for video anomaly detection, IEEE Trans. Pattern Anal. Mach. Intell., 44, 7505, 10.1109/TPAMI.2021.3129349 Madan, N., Farkhondeh, A., Nasrollahi, K., Escalera, S., Moeslund, T.B., 2021. Temporal Cues From Socially Unacceptable Trajectories for Anomaly Detection. In: Proceedings of ICCVW. pp. 2150–2158. Mahadevan, V., Li, W.-X., Bhalodia, V., Vasconcelos, N., 2010. Anomaly Detection in Crowded Scenes. In: Proceedings of CVPR. pp. 1975–1981. McHardy, R., Adel, H., Klinger, R., 2019. Adversarial Training for Satire Detection: Controlling for Confounding Variables. In: Proceedings of NAACL. pp. 660–665. Mehran, R., Oyama, A., Shah, M., 2009. Abnormal crowd behavior detection using social force model. In: Proceedings of CVPR. pp. 935–942. Nguyen, T.-N., Meunier, J., 2019. Anomaly Detection in Video Sequence With Appearance-Motion Correspondence. In: Proceedings of ICCV. pp. 1273–1283. Nilsback, M.-E., Zisserman, A., 2006. A Visual Vocabulary for Flower Classification. In: Proceedings of CVPR. pp. 1447–1454. Noroozi, M., Favaro, P., 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In: Proceedings of ECCV. pp. 69–84. Park, C., Cho, M., Lee, M., Lee, S., 2022. FastAno: Fast Anomaly Detection via Spatio-Temporal Patch Transformation. In: Proceedings of WACV. pp. 2249–2259. Park, H., Noh, J., Ham, B., 2020. Learning Memory-guided Normality for Anomaly Detection. In: Proceedings of CVPR. pp. 14372–14381. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A., 2016. Context Encoders: Feature Learning by Inpainting. In: Proceedings of CVPR. pp. 2536–2544. Purwanto, D., Chen, Y.-T., Fang, W.-H., 2021. Dance With Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos. In: Proceedings of ICCV. pp. 173–183. Ramachandra, B., Jones, M., 2020. Street Scene: A new dataset and evaluation protocol for video anomaly detection. In: Proceedings of WACV. pp. 2569–2578. Ramachandra, B., Jones, M., Vatsavai, R., 2020. Learning a distance function with a Siamese network to localize anomalies in videos. In: Proceedings of WACV. pp. 2598–2607. Ramachandra, 2021, Perceptual metric learning for video anomaly detection, Mach. Vis. Appl., 32, 1432, 10.1007/s00138-021-01187-5 Ramachandra, 2022, A survey of single-scene video anomaly detection, IEEE Trans. Pattern Anal. Mach. Intell., 44, 2293 Ravanbakhsh, M., Nabi, M., Mousavi, H., Sangineto, E., Sebe, N., 2018. Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection. In: Proceedings of WACV. pp. 1689–1698. Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N., 2017. Abnormal Event Detection in Videos using Generative Adversarial Nets. In: Proceedings of ICIP. pp. 1577–1581. Redmon, 2018 Ren, H., Liu, W., Olsen, S.I., Escalera, S., Moeslund, T.B., 2015. Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection. In: Proceedings of BMVC. pp. 28.1–28.13. Ristea, N.-C., Madan, N., Ionescu, R.T., Nasrollahi, K., Khan, F.S., Moeslund, T.B., Shah, M., 2022. Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection. In: Proceedings of CVPR. pp. 13576–13586. Russakovsky, 2015, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 211, 10.1007/s11263-015-0816-y Sabokrou, 2017, Deep-cascade: Cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes, IEEE Trans. Image Process., 26, 1992, 10.1109/TIP.2017.2670780 Saligrama, V., Chen, Z., 2012. Video anomaly detection based on local statistical aggregates. In: Proceedings of CVPR. pp. 2112–2119. Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B., 2017. Deep Appearance Features for Abnormal Behavior Detection in Video. In: Proceedings of ICIAP. Vol. 10485. pp. 779–789. Sultani, W., Chen, C., Shah, M., 2018. Real-World Anomaly Detection in Surveillance Videos. In: Proceedings of CVPR. pp. 6479–6488. Sun, C., Jia, Y., Hu, Y., Wu, Y., 2020. Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos. In: Proceedings of ACMMM. pp. 184–192. Tang, 2020, Integrating prediction and reconstruction for anomaly detection, Pattern Recognit. Lett., 129, 123, 10.1016/j.patrec.2019.11.024 Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G., 2021. Weakly-Supervised Video Anomaly Detection With Robust Temporal Feature Magnitude Learning. In: Proceedings of ICCV. pp. 4975–4986. Tran, H.T., Hogg, D., 2017. Anomaly Detection using a Convolutional Winner-Take-All Autoencoder. In: Proceedings of BMVC. Vu, H., Nguyen, T.D., Le, T., Luo, W., Phung, D., 2019. Robust Anomaly Detection in Videos Using Multilevel Representations. In: Proceedings of AAAI. Vol. 33. pp. 5216–5223. Wang, Z., Zou, Y., Zhang, Z., 2020. Cluster Attention Contrast for Video Anomaly Detection. In: Proceedings of ACMMM. pp. 2463–2471. Wu, 2019, A deep one-class neural network for anomalous event detection in complex scenes, IEEE Trans. Neural Netw. Learn. Syst., 31, 2609 Wu, S., Moore, B.E., Shah, M., 2010. Chaotic Invariants of Lagrangian Particle Trajectories for Anomaly Detection in Crowded Scenes. In: Proceedings of CVPR. pp. 2054–2060. Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L., 2021. CvT: Introducing Convolutions to Vision Transformers. In: Proceedings of ICCV. pp. 22–31. Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N., 2015. Learning Deep Representations of Appearance and Motion for Anomalous Event Detection. In: Proceedings of BMVC. pp. 8.1–8.12. Xu, 2017, Detecting anomalous events in videos by learning deep representations of appearance and motion, Comput. Vis. Image Underst., 156, 117, 10.1016/j.cviu.2016.10.010 Yang, 2021, Bidirectional retrospective generation adversarial network for anomaly detection in videos, IEEE Access, 9, 107842, 10.1109/ACCESS.2021.3100678 Yu, 2021, Abnormal event detection and localization via adversarial event prediction, IEEE Trans. Neural Netw. Learn. Syst., 1 Yu, G., Wang, S., Cai, Z., Liu, X., Xu, C., Wu, C., 2022. Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement. In: Proceedings of CVPR. pp. 13987–13998. Yu, G., Wang, S., Cai, Z., Zhu, E., Xu, C., Yin, J., Kloft, M., 2020. Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events. In: Proceedings of ACMMM. pp. 583–591. Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I., 2020. CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection. In: Proceedings of ECCV. pp. 358–376. Zaheer, M.Z., Mahmood, A., Khan, M.H., Segu, M., Yu, F., Lee, S.-I., 2022. Generative Cooperative Learning for Unsupervised Video Anomaly Detection. In: Proceedings of CVPR. pp. 14744–14754. Zhang, 2016, Video anomaly detection based on locality sensitive hashing filters, Pattern Recognit., 59, 302, 10.1016/j.patcog.2015.11.018 Zhang, 2020, Video anomaly detection and localization using motion-field shape description and homogeneity testing, Pattern Recognit., 105, 10.1016/j.patcog.2020.107394 Zhao, B., Fei-Fei, L., Xing, E.P., 2011. Online Detection of Unusual Events in Videos via Dynamic Sparse Coding. In: Proceedings of CVPR. pp. 3313–3320. Zhong, J.-X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G., 2019. Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection. In: Proceedings of CVPR. pp. 1237–1246.