Background subtraction based on deep convolutional neural networks features

Multimedia Tools and Applications - Tập 78 - Trang 14549-14571 - 2018
Jianfang Dou1, Qin Qin1, Zimei Tu1
1Department of Automation and Mechanical and Electrical engineering, School of Intelligent Manufacturing and Control Engineering, Shanghai Polytechnic University, Shanghai, People’s Republic of China

Tóm tắt

Background modeling and subtraction, the task to detect moving objects in a scene, is a fundamental and critical step for many high level computer vision tasks. However, background subtraction modeling is still an open and challenge problem particularly in practical scenarios with drastic illumination changes and dynamic backgrounds. In this paper, we propose a novel foreground detection method based on CNNs(Convolutional Neural Networks) to deal with challenges confronted with background subtraction. Firstly, given a cleaned background image without moving objects, constructing adjustable neighborhood of each pixel in the background image to form windows; CNN features are extracted with a pre-trained CNN model for each window to form a features based background model. Secondly, for the current frame of a video scene, extracting features with the same operation as the background model. Euclidean distance is adopted to build distance map for current frame and background image with CNN features. Thirdly, the distance map is fed into graph cut algorithm to obtain foreground mask. In order to deal with background changes, the background model is updated with a certain rate. Experimental results verify that the proposed approach is effective to detect foreground objects from complex background environments, and outperforms some state-of-the-art methods.

Tài liệu tham khảo

Balcilar M, Karabiber F, and Sonmez A (2013) Performance analysis of Lab2000HL color space for background subtraction. IEEE International Symposium on Innovations in Intelligent Systems and Applications, INISTA 2013 Barnich O, Van Droogenbroeck M (2011) ViBe: a universal background subtraction algorithm for video sequences[J]. IEEE Trans Image Process 20(6):1709–1724 Bilodeau GA, Jodoin JP, Saunier N. (2013) Change detection in feature space using local binary similarity patterns[C]//Computer and Robot Vision (CRV), 2013 International Conference on. IEEE, 106–112 Bouwmans T (2011) Recent advanced statistical background modeling for foreground detection—a systematic survey. Recent Patents Comput Sci 4(3):147–176 Bouwmans T, Silva C, Marghes C, et al. (2016) On the Role and the Importance of Features for Background Modeling and Foreground Detection[J]. arXiv preprint arXiv:1611.09099 Boykov Y and Jolly MP (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images, Proc. IEEE Intl. Conf. on Computer Vision, pp. 105–112, Vancouver, British Columbia, Canada Camplani M, Salgado L (2013) Background foreground segmentation with RGB-D Kinect data: an efficient combination of classifiers. Journal on Visual Communication and Image Representation 25(1):122–136 Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 Dou J, Li J (2014) Modeling the background and detecting moving objects based on SIFT flow. Optik- International Journal for Light and Electron Optics 125(1):435–440 Elgammal A, Harwood D, and Davis L, (2000) Non-parametric model for background subtraction, in Proc. ECCV, pp. 751–767 Goyette N, Jodoin PM, Porikli F et al. (2012) Changedetection. net: A new change detection benchmark dataset[C]//Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 1–8 He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European Conference on Computer Vision, pp. 346–361 Heikkila M, Pietikainen M (2006) A texture-based method for modeling the background and detecting moving objects[J]. IEEE Trans Pattern Anal Mach Intell 28(4):657–662 Hofmann M, Tiefenbacher P, Rigoll G. (2012) Background segmentation with feedback: The pixel-based adaptive segmenter[C]//Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 38–43 Javed S, Bouwmans T, and Jung S. (2015) Depth Extended Online RPCA with Spatiotemporal Constraints for Robust Background Subtraction. Korea-Japan Workshop on Frontiers of Computer Vision, FCV 2015 Jia Y (2013) Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/. 2013 Kim J, Rivera A, Ryu B, and Chae O (2015) Simultaneous foreground detection and classification with hybrid features. International Conference on Computer Vision, ICCV 2015 Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105 Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. pp. 1097–1105 LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural computation pp. 541–551 LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradientbased learning applied to document recognition. Proc IEEE 86(11):2278–2324 Liu Y, Nie L, Han L et al (2015) Action2Activity: recognizing complex activities from sensor data. IJCAI 2015:1617–1623 Liu L, Cheng L, Liu Y, et al. Recognizing complex activities by a probabilistic interval-based model. AAAI. 2016, 30: 1266–1272 Liu Y, Nie L, Liu L et al (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115 Liu Y, Zhang L, Nie L et al (2016) Fortune teller: predicting your career path. AAAI 2016:201–207 Martins I, Carvalho P, Corte-Real L, and Luis Alba-Castro J. (2016) Bio-inspired boosting for moving objects segmentation. International Conference on Image Analysis and Recognition, ICIAR 2016 Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: International Conference on Machine Learning. pp. 807–814 Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection. In: IEEE International Conference on Computer Vision. pp. 2056–2063 Ouyang W, Luo P, Zeng X, Qiu S, Tian Y, Li H, Yang S, Wang Z, Xiong Y, Qian C et al. (2014) Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv preprint arXiv:1409.3505 Rother C, Kolmogorov V, Blake A (2004) GrabCut –interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314 Simonyan K and Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Stauffer C, Grimson E (1999) Adaptive background mixture models for real-time tracking. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 1999:246–252 Stauffer C and Grimson WEL, (1999) Adaptive background mixture models for real-time tracking, in CVPR Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems. pp. 1988–1996 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A (2014) Going deeper with convolutions. arXiv preprint:1409.4842 Szegedy C, Reed S, Erhan D, Anguelov D. (2014) Scalable, highquality object detection. arXiv preprint arXiv:1412.1441 Toyama K, Krumm J, Brumitt B, Meyers B (1999) Wallflower: Principles and practice of background maintenance, Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, IEEE, pp. 255–261 Zeiler MD and Fergus R. (2013) Visualizing and understanding convolutional neural networks. arXiv preprint arXiv:1311.2901 Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based RCNNs for fine-grained category detection. In: European Conference on Computer Vision. pp. 834–849