Deep learning in video multi-object tracking: A survey

Neurocomputing - Tập 381 - Trang 61-88 - 2020
Gioele Ciaparrone1,2, Francisco Luque Sánchez1, Siham Tabik1, Luigi Troiano3, Roberto Tagliaferri2, Francisco Herrera1
1Andalusian Research Institute in Data Science and Computational Intelligence, University of Granada, 18071 Granada, Spain
2Department of Management and Innovation Systems, University of Salerno, 84084 Fisciano (SA), Italy
3Department of Engineering, University of Sannio, 82100, Benevento, Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, (2014) arXiv:1409.1556.

Szegedy, 2015, Going deeper with convolutions, 1

He, 2016, Deep residual learning for image recognition, 770

Ren, 2015, Faster R-CNN: Towards real-time object detection with region proposal networks, 91

Liu, 2016, SSD: single shot multibox detector, 21

Redmon, 2017, Yolo9000: better, faster, stronger, 7263

Sak, 2014, Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Sundermeyer, 2012, Lstm neural networks for language modeling

Fan, 2014, TTS synthesis with bidirectional LSTM based recurrent neural networks

Marchi, 2014, Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks, 2164

W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao, T.-K. Kim, Multiple object tracking: a literature review, (2014) arXiv:1409.7618.

Camplani, 2016, Multiple human tracking in RGB-depth data: a survey, IET Comput. Vision, 11, 265, 10.1049/iet-cvi.2016.0178

P. Emami, P.M. Pardalos, L. Elefteriadou, S. Ranka, Machine learning methods for solving assignment problems in multi-target tracking, (2018) arXiv:1802.06897.

L. Leal-Taixé, A. Milan, K. Schindler, D. Cremers, I. Reid, S. Roth, Tracking the trackers: an analysis of the state of the art in multiple object tracking, (2017) arXiv:1704.02781.

L. Leal-Taixé, A. Milan, I. Reid, S. Roth, K. Schindler, Motchallenge 2015: towards a benchmark for multi-target tracking, (2015) arXiv:1504.01942.

A. Milan, L. Leal-Taixé, I. Reid, S. Roth, K. Schindler, Mot16: a benchmark for multi-object tracking, (2016) arXiv:1603.00831.

He, 2017, Mask R-CNN, 2961

Dai, 2016, R-FCN: Object detection via region-based fully convolutional networks, 379

Wu, 2006, Tracking of multiple, partially occluded humans based on static body part detection, 1, 951

Bernardin, 2008, Evaluating multiple object tracking performance: the clear MOT metrics, J. Image Video Process., 2008, 1, 10.1155/2008/246309

Ristani, 2016, Performance measures and a data set for multi-target, multi-camera tracking, 17

Stiefelhagen, 2007, Multimodal technologies for perception of humans, 4122

Stiefelhagen, 2008, Multimodal technologies for perception of humans, 4625

Dollár, 2014, Fast feature pyramids for object detection, IEEE Trans. Pattern Anal. Mach. Intell., 36, 1532, 10.1109/TPAMI.2014.2300479

Felzenszwalb, 2009, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell., 32, 1627, 10.1109/TPAMI.2009.167

R.B. Girshick, P.F. Felzenszwalb, D. McAllester, Discriminatively trained deformable part models, release 5, 2012, (http://people.cs.uchicago.edu/~rbg/latent-release5/).

Yang, 2016, Exploit all the layers: fst and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers, 2129

P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, L. Leal-Taixe, Cvpr19 tracking and detection challenge: How crowded can it get?, 2019.

Geiger, 2012, Are we ready for autonomous driving? The Kitti vision benchmark suite, 3354

Geiger, 2013, Vision meets robotics: the Kitti dataset, Int. J. Robot. Res., 32, 1231, 10.1177/0278364913491297

Wang, 2013, Regionlets for generic object detection, 17

L. Wen, D. Du, Z. Cai, Z. Lei, M.-C. Chang, H. Qi, J. Lim, M.-H. Yang, S. Lyu, UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking, (2015) arXiv:1511.04136.

Andriluka, 2010, Monocular 3D pose estimation and tracking by detection, 623

Ferryman, 2009, Pets2009:dataset and challenge, 1

Bewley, 2016, Simple online and realtime tracking, 3464

Kalman, 1960, A new approach to linear filtering and prediction problems, J. Basic Eng., 82, 35, 10.1115/1.3662552

Kuhn, 1955, The hungarian method for the assignment problem, Naval Res. Logist. Q., 2, 83, 10.1002/nav.3800020109

Yu, 2016, POI: multiple object tracking with high performance detection and appearance feature, 36

Bell, 2016, Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks, 2874

Gidaris, 2015, Object detection via a multi-region and semantic segmentation-aware CNN model, 1134

Wojke, 2017, Simple online and realtime tracking with a deep association metric, 3645

Mahmoudi, 2019, Multi-target tracking using CNN-based features: CNNMTT, Multimed. Tools Appl., 78, 7077, 10.1007/s11042-018-6467-6

Wan, 2018, An online and flexible multi-object tracking framework using long short-term memory, 1230

Ujiie, 2018, Interpolation-based object detection using motion vectors for embedded real-time tracking systems, 616

Q. He, J. Wu, G. Yu, C. Zhang, SOT for MOT, (2017) arXiv:1807.01253.

Li, 2017, Multi-person tracking by discriminative affinity model and hierarchical association, 1741

W. Li, M.-C. Chang, S. Lyu, Who did what at where and when: simultaneous multi-person tracking and activity recognition, (2018) arXiv:1807.01253.

Jorquera, 2019, Probability hypothesis density filter using determinantal point processes for multi object tracking, Comput. Vis. Image Underst., 183, 33, 10.1016/j.cviu.2019.04.001

Zhong, 2019, Decision controller for object tracking with deep reinforcement learning, IEEE Access, 7, 28069, 10.1109/ACCESS.2019.2900476

Lu, 2019, Multi-target tracking by non-linear motion patterns based on hierarchical network flows, Multimed. Syst., 24, 383, 10.1007/s00530-019-00614-y

Tang, 2017, Multiple people tracking by lifted multicut and person re-identification, 3539

Ran, 2019, A robust multi-athlete tracking algorithm by exploiting discriminant features and long-term dependencies, 411

Hu, 2018, An automatic tracking method for multiple cells based on multi-feature fusion, IEEE Access, 6, 69782, 10.1109/ACCESS.2018.2880563

Zhang, 2019, Automatic individual pig detection and tracking in pig farms, Sensors, 19, 1188, 10.3390/s19051188

Zhou, 2018, Online multi-target tracking with tensor-based high-order graph matching, 1809

Danelljan, 2017, Eco: efficient convolution operators for tracking, 6638

Dalal, 2005, Histograms of oriented gradients for human detection, 1, 886

Van De Weijer, 2009, Learning color names for real-world applications, IEEE Trans. Image Process., 18, 1512, 10.1109/TIP.2009.2019809

Lu, 2017, Online video object detection using association LSTM, 2344

Kieritz, 2018, Joint detection and online multi-object tracking, 1459

Zhao, 2018, Multi-object tracking with correlation filter for autonomous vehicle, Sensors, 18, 2004, 10.3390/s18072004

Pearson, 1901, Liii. on lines and planes of closest fit to systems of points in space, Philos. Mag. J. Sci., 2, 559, 10.1080/14786440109462720

Wang, 2017, Large margin object tracking with circulant feature maps, 4021

Redmon, 2016, You only look once: unified, real-time object detection, 779

J. Redmon, A. Farhadi, Yolov3: an incremental improvement, (2018) arXiv:1804.02767.

Kim, 2018, Online tracker optimization for multi-pedestrian tracking using a moving vehicle camera, IEEE Access, 6, 48675, 10.1109/ACCESS.2018.2867621

Sharma, 2018, Beyond pixels: Leveraging geometry and shape cues for online multi-object tracking, 3508

Ren, 2017, Accurate single stage detector using recurrent rolling convolution, 5420

Xiang, 2017, Subcategory-aware convolutional neural networks for object proposals and detection, 924

Pernici, 2018, Memory based online learning of deep representations from video streams, 2324

Hu, 2017, Finding tiny faces, 951

Min, 2018, A new approach to track multiple vehicles with the combination of robust detection and two classifiers, IEEE Trans. Intell. Transp. Syst., 19, 174, 10.1109/TITS.2017.2756989

Barnich, 2011, Vibe: a universal background subtraction algorithm for video sequences, IEEE Trans.Image Process., 20, 1709, 10.1109/TIP.2010.2101613

Cortes, 1995, Support-vector networks, Mach. Learn., 20, 273, 10.1007/BF00994018

Yu, 2017, A model for fine-grained vehicle classification based on deep learning, Neurocomputing, 257, 97, 10.1016/j.neucom.2016.09.116

Bullinger, 2017, Instance flow based online multiple object tracking, 785

Dai, 2016, Instance-aware semantic segmentation via multi-task network cascades, 3150

Farnebäck, 2003, Two-frame motion estimation based on polynomial expansion, 363

Revaud, 2016, Deepmatching: hierarchical deformable dense matching, Int. J. Comput.Vis., 120, 300, 10.1007/s11263-016-0908-3

Hu, 2016, Efficient coarse-to-fine patchmatch for large displacement optical flow, 5704

Wang, 2014, Learning deep features for multiple object tracking by using a multi-task learning strategy, 838

Cadieu, 2009, Learning transformational invariants from natural movies, 209

Kim, 2015, Multiple hypothesis tracking revisited, 4696

Zheng, 2017, Person re-identification in the wild, 1367

Zheng, 2015, Scalable person re-identification: A benchmark, 1116

Gray, 2008, Viewpoint invariant pedestrian recognition with an ensemble of localized features, 262

Li, 2014, Deepreid: deep filter pairing neural network for person re-identification, 152

Chen, 2017, Enhancing detection model for multiple hypothesis tracking, 18

Yang, 2017, A hybrid data association framework for robust online multi-object tracking, IEEE Trans. Image Process., 26, 5667, 10.1109/TIP.2017.2745103

Girshick, 2015, Region-based convolutional networks for accurate object detection and segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 38, 142, 10.1109/TPAMI.2015.2437384

Wang, 2017, Robust tracking of fish schools using CNN for head identification, Multimed. Tools Appl., 76, 23679, 10.1007/s11042-016-4045-3

S. Zagoruyko, N. Komodakis, Wide residual networks, (2016) arXiv:1605.07146.

Kim, 2018, Multi-object tracking with neural gating using bilinear LSTM, 200

Bae, 2017, Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking, IEEE Trans. Pattern Anal. Mach. Intell., 40, 595, 10.1109/TPAMI.2017.2691769

Ullah, 2018, Deep feature based end-to-end transportation network for multi-target tracking, 3738

Fang, 2018, Recurrent autoregressive networks for online multi-object tracking, 466

Xiao, 2016, Learning deep feature representations with domain guided dropout for person re-identification, 1249

Fu, 2018, GM-PHD filter based online multiple human tracking using deep discriminative correlation matching, 4299

Vo, 2006, The gaussian mixture probability hypothesis density filter, IEEE Trans. Signal Process., 54, 4091, 10.1109/TSP.2006.881190

L. Wen, D. Du, S. Li, X. Bian, S. Lyu, Learning non-uniform hypergraph for multi-object tracking, Proceedings of the 33rd AAAI Conference on Artificial Intelligence (2019).

Russakovsky, 2015, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 211, 10.1007/s11263-015-0816-y

Korn, 2000, Influence sets based on reverse nearest neighbor queries, ACM Sigmod Record, 29, 201, 10.1145/335191.335415

Sheng, 2018, Heterogeneous association graph fusion for target association in multiple object tracking, IEEE Trans. Circuits Syst. Video Technol., 29, 3269, 10.1109/TCSVT.2018.2882192

Chen, 2019, Recurrent metric networks and batch multiple hypothesis for multi-object tracking, IEEE Access, 7, 3093, 10.1109/ACCESS.2018.2889187

Pishchulin, 2016, Deepcut: joint subset partition and labeling for multi person pose estimation, 4929

Kim, 2016, Similarity mapping with enhanced siamese network for multi-object tracking

Bromley, 1994, Signature verification using a “siamese” time delay neural network, 737

Wang, 2016, Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association, 1

Zhang, 2016, Tracking persons-of-interest via adaptive discriminative features, 415

Leal-Taixé, 2016, Learning by tracking: siamese CNN for robust target association, 33

Leal-Taixé, 2011, Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker, 120

Son, 2017, Multi-object tracking with quadruplet convolutional neural networks, 5620

Maksai, 2019, Eliminating exposure bias and loss-evaluation mismatch in multiple object tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person re-identification, (2017) arXiv:1703.07737.

Zhu, 2018, Online multi-object tracking with dual matching attention networks, 366

Ma, 2018, Trajectory factory:tracklet cleaving and re-connection by deep siamese BI-GRU for multiple object tracking, 1

Zhou, 2018, Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking, IEEE Trans. Circuits Sys. Video Technol., 29, 1011, 10.1109/TCSVT.2018.2825679

Long, 2018, Real-time multiple people tracking with deeply learned candidate selection and person re-identification

Lee, 2019, Multiple object tracking via feature pyramid siamese networks, IEEE Access, 7, 8181, 10.1109/ACCESS.2018.2889442

F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, Squeezenet: alexnet-level accuracy with 50x fewer parameters and <  0.5 MB model size, (2016) arXiv:1602.07360.

Ullah, 2017, A hierarchical feature model for multi-target tracking, 2612

Mallat, 1993, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process., 41, 3397, 10.1109/78.258082

Sadeghian, 2017, Tracking the untrackable: learning to track multiple cues with long-term dependencies, 300

Chu, 2017, Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism, 4836

Ozuysal, 2009, Fast keypoint recognition using random ferns, IEEE Trans. Pattern Anal. Mach. Intell., 32, 448, 10.1109/TPAMI.2009.23

Ullah, 2018, A directed sparse graphical model for multi-target tracking, 1816

Rabiner, 1986, An introduction to hidden markov models, IEEE ASSP Mag., 3, 4, 10.1109/MASSP.1986.1165342

Wang, 2017, Online multiple object tracking via flow and convolutional features, 3630

Ma, 2015, Hierarchical convolutional features for visual tracking, 3074

Lucas, 1981, An iterative image registration technique with an application to stereo vision, 121

Rosello, 2018, Multi-agent reinforcement learning for multi-object tracking, 1397

Babaee, 2018, Occlusion handling in tracking multiple people using rnn, 2715

Milan, 2017, Online multi-target tracking using recurrent neural networks

Xiang, 2015, Learning to track: online multi-object tracking by decision making, 4705

Fang, 2017, RMPE: regional multi-person pose estimation, 2334

Liang, 2018, Lstm multiple object tracker combining multiple cues, 2351

Zhou, 2018, Deep self-paced learning for person re-identification, Pattern Recognit., 76, 739, 10.1016/j.patcog.2017.10.005

Yoon, 2019, Data association for multi-object tracking via deep neural networks, Sensors, 19, 559, 10.3390/s19030559

Robicquet, 2016, Learning social etiquette: human trajectory understanding in crowded scenes, 549

Andriluka, 2008, People-tracking-by-detection and people-detection-by-tracking, 1

K. Cho, B. Van Merriënboer, D. Bahdanau, Y. Bengio, On the properties of neural machine translation: encoder-decoder approaches, (2014) arXiv:1409.1259.

Andres, 2015, Lifting of multicuts, 3

Weinzaepfel, 2013, Deepflow: large displacement optical flow with deep matching, 1385

Keuper, 2015, Efficient decomposition of image and mesh graphs by lifted multicuts, 1751

Chen, 2017, Online multi-object tracking with convolutional neural networks, 645

Arulampalam, 2002, A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking, IEEE Trans. Signal Process., 50, 174, 10.1109/78.978374

Sanchez-Matilla, 2016, Online multi-target tracking with strong and weak detections, 84

Ma, 2018, Customized multi-person tracker

Tang, 2016, Multi-person tracking by multicut and deep matching, 100

Ren, 2018, Collaborative deep reinforcement learning for multi-object tracking, 586

Nam, 2016, Learning multi-domain convolutional neural networks for visual tracking, 4293

Jiang, 2018, Precise regression for bounding box correction for improved tracking based on deep reinforcement learning, 1643

Lee, 2016, Multi-class multi-object tracking using changing point detection, 68

Hoak, 2017, Image-based multi-target tracking through multi-bernoulli filtering with interactive likelihoods, Sensors, 17, 501, 10.3390/s17030501

Henschel, 2018, Fusion of head and full-body detectors for multi-object tracking, 1428

Stewart, 2016, End-to-end people detection in crowded scenes, 2325

Gan, 2018, Online CNN-based multiple object tracking with enhanced model updates and identity association, Signal Process.: Image Commun., 66, 95

Xiang, 2019, Online multi-object tracking based on feature representation and Bayesian filtering within a deep learning architecture, IEEE Access

Chu, 2019, Online multi-object tracking with instance-aware tracker and dynamic model refreshment, 161

Watkins, 1989

Takeuchi, 2006, A unifying framework for detecting outliers and change points from time series, IEEE Trans. Knowl. Data Eng., 18, 482, 10.1109/TKDE.2006.1599387

Dollar, 2009, Pedestrian detection: a benchmark, 304

Hoseinnezhad, 2012, Visual tracking of numerous targets via multi-bernoulli filtering of image data, Pattern Recognit., 45, 3625, 10.1016/j.patcog.2012.04.004

Milan, 2014, Improving global multi-target tracking with local updates, 174

Frank, 1956, An algorithm for quadratic programming, Naval Res. Logist. Q., 3, 95, 10.1002/nav.3800030109

Cao, 2017, Realtime multi-person 2D pose estimation using part affinity fields, 7291

Zhao, 2017, Deeply-learned part-aligned representations for person re-identification, 3219

Henriques, 2014, High-speed tracking with Kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., 37, 583, 10.1109/TPAMI.2014.2345390

Wen, 2014, Multiple target tracking based on undirected hierarchical relation hypergraph, 1282

Qian, 2014, Automatically detect and track multiple fish swimming in shallow water with frequent occlusion, PloS One, 9, e106506, 10.1371/journal.pone.0106506

Pirsiavash, 2011, Globally-optimal greedy algorithms for tracking a variable number of objects, 1201

Gold, 1996, Softmax to softassign: neural network algorithms for combinatorial optimization, J. Artif. Neural Netw., 2, 381

Huang, 2008, Robust object tracking by hierarchical association of detection responses, 788

Benenson, 2012, Pedestrian detection at 100 frames per second, 2903