Unsupervised Deep Representation Learning for Real-Time Tracking

Springer Science and Business Media LLC - Tập 129 - Trang 400-418 - 2020

Ning Wang¹, Wengang Zhou^1,2, Yibing Song³, Chao Ma⁴, Wei Liu³, Houqiang Li^1,2

¹The CAS Key Laboratory of GIPAS, University of Science and Technology of China, Hefei, China

²Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China

³Tencent AI Lab, Shenzhen, China

⁴The MOE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China

Tóm tắt

The advancement of visual tracking has continuously been brought by deep learning models. Typically, supervised learning is employed to train these models with expensive labeled data. In order to reduce the workload of manual annotation and learn to track arbitrary objects, we propose an unsupervised learning method for visual tracking. The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking. Specifically, the tracker is able to forward localize a target object in successive frames and backtrace to its initial position in the first frame. Based on such a motivation, in the training process, we measure the consistency between forward and backward trajectories to learn a robust tracker from scratch merely using unlabeled videos. We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning. Without bells and whistles, the proposed unsupervised tracker achieves the baseline accuracy of classic fully supervised trackers while achieving a real-time speed. Furthermore, our unsupervised framework exhibits a potential in leveraging more unlabeled or weakly labeled data to further improve the tracking accuracy.

Tài liệu tham khảo

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., & Torr, P.H. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of the European conference on computer vision workshops (ECCV workshop). Bolme, D.S., Beveridge, J.R., Draper, B.A., & Lui, Y.M. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In British machine vision conference (BMVC). Chen, B., Wang, D., Li, P., Wang, S., & Lu, H. (2018). Real-time’actor-critic’tracking. In: Proceedings of the European conference on computer vision (ECCV). Choi, J., Jin Chang, H., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., & Young Choi, J. (2018). Context-aware deep feature compression for high-speed visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Choi, J., Jin Chang, H., Jeong, J., Demiris, Y., & Young Choi, J. (2016). Visual tracking using attention-modulated disintegration and integration. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Choi, J., Jin Chang, H., Yun, S., Fischer, T., Demiris, Y., & Young Choi, J. (2017). Attentional correlation filter network for adaptive visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Danelljan, M., Bhat, G., Shahbaz Khan, F., & Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Danelljan, M., Häger, G., Khan, F., & Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In British machine vision conference (BMVC). Danelljan, M., Häger, G., Khan, F.S., & Felsberg, M. (2016) Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE international conference on computer vision (ICCV). Danelljan, M., Robinson, A., Khan, F.S., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the european conference on computer vision (ECCV). Dong, X., & Shen, J. (2018). Triplet loss in siamese network for object tracking. In Proceedings of the European conference on computer vision (ECCV). Dong, X., Shen, J., Wang, W., Liu, Y., Shao, L., & Porikli, F. (2018). Hyperparameter optimization for tracking with continuous deep q-learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Galoogahi, H.K., Fagg, A., & Lucey, S. (2017). Learning background-aware correlation filters for visual tracking. In Proceedings of the IEEE international conference on computer vision (ICCV). He, A., Luo, C., Tian, X., & Zeng, W. (2018). A twofold siamese network for real-time object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3), 583–596. Huang, C., Lucey, S., & Ramanan, D. (2017). Learning policies for adaptive tracking with deep feature cascades. In Proceedings of the IEEE international conference on computer vision (ICCV). Huang, D., Luo, L., Chen, Z., Wen, M., & Zhang, C. (2017). Applying detection proposals to visual tracking for scale and aspect ratio adaptability. International Journal of Computer Vision (IJCV), 122(3), 524–541. Jung, I., Son, J., Baek, M., & Han, B. (2018). Real-time mdnet. In Proceedings of the European conference on computer vision (ECCV). Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(7), 1409–1422. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., & Cehovin Zajc, L., et al. (2018) The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision workshops (ECCV Workshop). Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernández, G., Vojir T., & Hager, et al. (2016). The visual object tracking vot2016 challenge results. In Proceedings of the European conference on computer vision workshops (ECCV Workshop). Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernández, G., Vojir, T., & Hager, et al. (2017). The visual object tracking vot2017 challenge results. In Proceedings of the IEEE international conference on computer vision workshops (ICCV Workshop). Kristan, M., Matas, J., Leonardis, A., Vojíř, T., Pflugfelder, R., Fernandez, G., et al. (2016). A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(11), 2137–2155. Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NeurIPS). Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J. & Ng, A.Y. (2011). Building high-level features using large scale unsupervised learning. arXiv:1112.6209. Lee, D.Y., Sim, J.Y., & Kim, C.S. (2015). Multihypothesis trajectory analysis for robust visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Lee, H.Y., Huang, J.B., Singh, M., & Yang, M.H. (2017). Unsupervised representation learning by sorting sequences. In Proceedings of the IEEE international conference on computer vision (ICCV) Li, B., Yan, J., Wu, W., Zhu, Z. & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Li, F., Yao, Y., Li, P., Zhang, D., Zuo, W., & Yang, M.H. (2017). Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. In Proceedings of the IEEE international conference on computer vision workshops (ICCV Workshop). Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: algorithms and benchmark. IEEE Transactions on Image Processing (TIP), 24(12), 5630–5644. Liu, S., Zhang, T., Cao, X., & Xu, C. (2016). Structural correlation filter for robust visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Lu, X., Ma, C., Ni, B., Yang, X., Reid, I., & Yang, M.H. (2018). Deep regression tracking with shrinkage loss. In Proceedings of the European conference on computer vision (ECCV). LukeźIăź, A., Vojíř, T., Čehovin Zajc, L., Matas, J., & Kristan, M. (2018). Discriminative correlation filter tracker with channel and spatial reliability. International Journal of Computer Vision (IJCV), 126(7), 671–688. Lukezic, A., Vojir, T., Cehovin Zajc, L., Matas, J., & Kristan, M. (2017). Discriminative correlation filter with channel and spatial reliability. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Ma, C., Huang, J.B., Yang, X. & Yang, M.H. (2015). Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE international conference on computer vision (ICCV). Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2018). Adaptive correlation filters with long-term and short-term memory for object tracking. International Journal of Computer Vision (IJCV), 126(8), 771–796. Meister, S., Hur, J., & Roth, S. (2018). Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In AAAI conference on artificial intelligence (AAAI). Mueller, M., Smith, N., & Ghanem, B. (2017). Context-aware correlation filter tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Müller, M., Bibi, A., Giancola, S., Al-Subaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (ECCV). Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325. Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(6), 1137–1149. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., & Yang, M.H. (2017). Crest: Convolutional residual learning for visual tracking. In Proceedings of the IEEE international conference on computer vision (ICCV). Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W., & Yang, M.H. (2018). Vital: Visual tracking via adversarial learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Sui, Y., Zhang, Z., Wang, G., Tang, Y., & Zhang, L. (2019). Exploiting the anisotropy of correlation filter learning for visual tracking. International Journal of Computer Vision (IJCV), 127, 1–22. Please confirm the inserted volume number is correct in Ref. Sui et al. (2019). Tomasi, C., & Kanade, T. (1991). Detection and tracking of point features. Pittsburgh: Carnegie Mellon University. Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A., Torr, P., & Gavves, E. (2018). Long-term tracking in the wild: A benchmark. In Proceedings of the European conference on computer vision (ECCV). Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., & Torr, P.H. (2017). End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Anticipating visual representations from unlabeled video. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., & Murphy, K. (2018). Tracking emerges by colorizing videos. In Proceedings of the European conference on computer vision (ECCV). Wang, N., Song, Y., Ma, C., Zhou, W., Liu, W., & Li, H. (2019). Unsupervised deep tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Wang, N., & Yeung, D.Y. (2013). Learning a deep compact image representation for visual tracking. In Advances in neural information processing systems (NeurIPS). Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., & Li, H. (2018). Multi-cue correlation filters for robust visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Wang, Q., Gao, J., Xing, J., Zhang, M., & Hu, W. (2017). Dcfnet: Discriminant correlation filters network for visual tracking. arXiv:1704.04057 Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., & Maybank, S. (2018). Learning attentions: Residual attentional siamese network for high performance online visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Wang, X., & Gupta, A. (2015). Unsupervised learning of visual representations using videos. In Proceedings of the IEEE international conference on computer vision (ICCV). Wang, X., Jabri, A., & Efros, A.A. (2019). Learning correspondence from the cycle-consistency of time. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Weijer, J. V. D., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing (TIP), 18(7), 1512–1523. Wu, Y., Lim, J., & Yang, M.H. (2013). Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(9), 1834–1848. Yang, T., & Chan, A.B. (2018). Learning dynamic memory networks for object tracking. In Proceedings of the European conference on computer vision (ECCV) Yao, Y., Wu, X., Zhang, L., Shan, S., & Zuo, W. (2018). Joint representation and truncated inference learning for correlation filter based tracking. In Proceedings of the European conference on computer vision (ECCV) Yin, Z., & Shi, J. (2018). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhang, M., Wang, Q., Xing, J., Gao, J., Peng, P., Hu, W., & Maybank, S. (2018). Visual tracking via spatially aligned correlation filters network. In Proceedings of the European conference on computer vision (ECCV). Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., & Lu, H. (2018). Structured siamese network for real-time visual tracking. In Proceedings of the European conference on computer vision (ECCV). Zhipeng, Z., Houwen, P., & Qiang, W. (2019). Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2017) Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., & Efros, A.A. (2016). Learning dense correspondence via 3d-guided cycle consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhou, X., Zhu, M., & Daniilidis, K. (2015). Multi-image matching via fast alternating minimization. In Proceedings of the IEEE international conference on computer vision (ICCV). Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. (2018). Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA