Deep learning in multi-object detection and tracking: state of the art

Springer Science and Business Media LLC - Tập 51 Số 9 - Trang 6400-6429 - 2021
Shantanu Pal1, Anima Pramanik2, J. Maiti2, Pabitra Mitra3
1Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India
2ISE Dept., IIT Kharagpur, IIT Kharagpur, Kharagpur, India
3Department of Computer Science and Engineering, Kharagpur, India

Tóm tắt

Từ khóa


Tài liệu tham khảo

Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868

Pal S K (2018) Data science and technology: challenges, opportunities and national relevance. 14th annual convocation speech, national institute of technology, Calicut

Pal S K, Bhoumik D, Chakraborty D B (2020) Granulated deep learning and z-numbers in motion detection and object recognition. Neural Comput Appl 32(21):16533–16548

Chakraborty DB, Pal S K (2021) Granular Video Computing: with Rough Sets, Deep Learning and in IoT. World Scientific, Singapore

Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009

Pal S K, King R A (1983) On edge detection of x-ray images using fuzzy sets. IEEE Trans Pattern Anal Mach Intell 5(1):69–77

Deravi F, Pal S K (1983) Grey level thresholding using second-order statistics. Pattern Recogn Lett 1(5-6):417–422

Pal S K, King R A, Hashim AA (1983) Automatic grey level thresholding through index of fuzziness and entropy. Pattern Recogn Lett 1(3):141–146

Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

Masi I, Wu Y, Hassner T, Natarajan P (2018) Deep face recognition: A survey. In: 2018 31st SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 471–478

Hasan M, Orgun M A, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463

Brunetti A, Buongiorno D, Trotta G F, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 300:17–33

Ren X, Zhou Y, He J, Chen K, Yang X, Sun J (2016) A convolutional neural network-based chinese text detection algorithm via text structure modeling. IEEE Trans Multimed 19(3):506–518

Fan D-P, Wang W, Cheng M-M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8554–8564

Pal N R, Pal S K (1993) A review on image segmentation techniques. Pattern Recogn 26 (9):1277–1294

Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

Everingham M, Van Gool L, Williams Christopher KI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Duerig T et al (2018) The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982

Krizhevsky A, Sutskever I, Hinton G E (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

Zhang X, Fang Z, Wen Y, Li Z, Qiao Y (2017) Range loss for deep face recognition with long-tailed training data. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5409–5418

Chung D, Tahboub K, Delp E J (2017) A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1983–1991

Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2017) Pooling the convolutional layers in deep convnets for video action recognition. IEEE Trans Circ Syst Video Technol 28(8):1839–1849

Geng H-, Zhang H, Xue Y-, Zhou M, Xu G-, Gao Z (2017) Semantic image segmentation with fused cnn features. Optoelectron Lett 13(5):381–385

Chakraborty D B, Pal S K (2016) Neighborhood granules and rough rule-base in tracking. Nat Comput 15(3):359–370

Chakraborty D B, Pal S K (2017) Neighborhood rough filter and intuitionistic entropy in unsupervised tracking. IEEE Trans Fuzzy Syst 26(4):2188–2200

Pal S K, Chakraborty D B (2016) Granular flow graph, adaptive rule generation and tracking. IEEE Trans Cybern 47(12):4096– 4107

Wang N, Yeung D-Y (2013) Learning a deep compact image representation for visual tracking. In: Advances in neural information processing systems, pp 809–817

Henriques J F, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

Choi J, Jin Chang H, Fischer T, Yun S, Lee K, Jeong J, Demiris Y, Young Choi J (2018) Context-aware deep feature compression for high-speed visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 479–488

Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr Philip HS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2805–2813

Li J, Zhou X, Chan S, Chen S (2017) Object tracking using a convolutional network and a structured output svm. Comput Vis Media 3(4):325–335

Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

Danelljan M, Robinson A, Khan F S, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488

Ma C, Huang J-B, Yang X, Yang M-H (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082

Milan A, Rezatofighi S H, Dick A, Reid I, Schindler K (2016) Online multi-target tracking using recurrent neural networks. arXiv:1604.03635

Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: Review and experimental comparison. Pattern Recogn 76:323–338

Xu Y, Zhou X, Chen S, Li F (2019) Deep learning for multiple object tracking: a survey. IET Comput Vis 13(4):355–368

Leal-Taixé L, Milan A, Schindler K, Cremers D, Reid I, Roth S (2017) Tracking the trackers: an analysis of the state of the art in multiple object tracking. arXiv:1704.02781

Zhao Z-Q, Zheng P, Xu S-, Wu X (2019) Object detection with deep learning: A review. IEEE Trans Neural Networks Learn Syst 30(11):3212–3232

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: Large displacement optical flow with deep matching. In: Proceedings of the IEEE international conference on computer vision, pp 1385–1392

Cheng H Y, Hwang J N (2007) Multiple-target tracking for crossroad traffic utilizing modified probabilistic data association. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol 1. IEEE, pp I–921

Lim Y-C, Lee M, Lee C-H, Kwon S, Lee J- (2010) Improvement of stereo vision-based position and velocity estimation and tracking using a stripe-based disparity estimation and inverse perspective map-based extended kalman filter. Opt Lasers Eng 48(9):859–868

Cao X, Lan J, Yan P, Li X (2012) Vehicle detection and tracking in airborne videos by multi-motion layer analysis. Mach Vis Appl 23(5):921–935

Kim C, Li F, Ciptadi A, Rehg J M (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Detnet: A backbone network for object detection. arXiv:1804.06215

He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

Ghiasi G, Lin T-Y, Le Q V (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7036–7045

Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

Iandola F N, Han S, Moskewicz M W, Ashraf K, Dally W J, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv:1602.07360

Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput 29(9):2352–2449

Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

Liu J, Zhang S, Wang S, Metaxas D N (2016) Multispectral deep neural networks for pedestrian detection. arXiv:1611.02644

Zadeh L A (2011) A note on z-numbers. Inf Sci 181(14):2923–2932

Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 9259–9266

Zhang S, Wen L, Bian X, Lei Z, Li S Z (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212

Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316

Yang Z, Nevatia R (2016) A multi-scale cascade fully convolutional network face detector. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 633–638

Tu W-C, He S, Yang Q, Chien S-Y (2016) Real-time salient object detection with a minimum spanning tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2334–2342

Yang J, Yang M-H (2016) Top-down visual saliency via joint crf and dictionary learning. IEEE Trans Pattern Anal Mach Intell 39(3):576–588

Tomè D, Monti F, Baroffio L, Bondi L, Tagliasacchi M, Tubaro S (2016) Deep convolutional neural networks for pedestrian detection. Signal Process Image Commun 47:482–489

Zhao Z-Q, Bian H, Hu D, Cheng W, Glotin H (2017) Pedestrian detection based on fast r-cnn and batch normalization. In: International Conference on Intelligent Computing. Springer, pp 735–746

Rother C, Bordeaux L, Hamadi Y, Blake A (2006) Autocollage. ACM Trans Graph (TOG) 25(3):847–852

Chakraborty D, Shankar B U, Pal S K (2013) Granulation, rough entropy and spatiotemporal moving object detection. Appl Soft Comput 13(9):4001–4009

Pal S K, Mitra P (2002) Multispectral image segmentation using the rough-set-initialized em algorithm. IEEE Trans Geosci Remote Sens 40(11):2495–2501

Pal S K, Shankar B U, Mitra P (2005) Granular computing, rough entropy and object extraction. Pattern Recogn Lett 26(16):2509–2517

Rosin P L (2009) A simple method for detecting salient regions. Pattern Recogn 42(11):2363–2371

Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H-Y (2010) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

Gao D, Han S, Vasconcelos N (2009) Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Trans Pattern Anal Mach Intell 31(6):989–1005

Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2798–2805

Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 262–270

Wang L, Lu H, Ruan X, Yang M-H (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3183–3192

Cholakkal H, Johnson J, Rajan D (2018) Backtracking spatial pyramid pooling-based image classifier for weakly supervised top–down salient object detection. IEEE Trans Image Process 27(12):6064–6078

He S, Lau RWH, Liu W, Huang Z, Yang Q (2015) Supercnn: A superpixelwise convolutional neural network for salient object detection. Int J Comput Vis 115(3):330–344

Tang Y, Wu X (2016) Saliency detection via combining region-level and pixel-level predictions with cnns. In: European Conference on Computer Vision. Springer, pp 809–825

Wang X, Ma H, Chen X, You S (2017) Edge preserving and multi-scale contextual neural network for salient object detection. IEEE Trans Image Process 27(1):121–134

Gao X, Wang N, Tao D, Li X (2012) Face sketch–photo synthesis and retrieval using sparse representation. IEEE Trans Circ Sys Video Technol 22(8):1213–1226

Niu B, Yang Q, Shiu S C K, Pal S K (2008) Two-dimensional laplacianfaces method for face recognition. Pattern Recogn 41(10):3237–3243

Wang N, Tao D, Gao X, Li X, Li J (2014) A comprehensive survey to face hallucination. Int J Comput Vis 106(1):9–30

Majumder A, Behera L, Subramanian V K (2016) Automatic facial expression recognition system using deep network-based data fusion. IEEE Trans Cybern 48(1):103–114

Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, pp 650–657

Sun X, Wu P, Hoi Steven CH (2018) Face detection using deep learning: An improved faster rcnn approach. Neurocomputing 299:42–50

Wang H, Li Z, Ji X, Wang Y (2017) Face r-cnn. arXiv:1706.01061

Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv:1509.04874

Li Y, Sun B, Wu T, Wang Y (2016) Face detection with end-to-end integration of a convnet and a 3d model. . In: European Conference on Computer Vision. Springer, pp 420–436

Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? . In: European conference on computer vision. Springer, pp 443–457

Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912

Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3361–3369

Reid D (1979) An algorithm for tracking multiple targets. IEEE Trans Autom Control 24 (6):843–854

Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 3645–3649

Leal-Taixé L, Canton-Ferrer C, Schindler K (2016) Learning by tracking: Siamese cnn for robust target association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 33–40

Bae S-H, Yoon K-J (2017) Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Trans Pattern Anal Mach Intell 40(3):595–610

Bae S-H, Yoon K-J (2014) Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1218–1225

Wang B, Wang L, Shuai B, Zuo Z, Liu T, Luk Chan K, Wang G (2016) Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 1–8

Xiang Y, Alahi A, Savarese S (2015) Learning to track: Online multi-object tracking by decision making. In: Proceedings of the IEEE international conference on computer vision, pp 4705– 4713

Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3539–3548

Chen L, Ai H, Shang C, Zhuang Z, Bai B (2017) Online multi-object tracking with convolutional neural networks. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE, pp 645–649

Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4836–4845

Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5620–5629

Fang K (2016) Track-rnn: joint detection and tracking using recurrent neural networks. . In: Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona

Zhou S, Wang J, Wang J, Gong Y, Zheng N (2017) Point to set similarity based deep feature learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3741–3750

Xiang J, Zhang G, Hou J, Sang N, Huang R (2018) Multiple target tracking by learning feature representation and distance metric jointly. arXiv:1802.03252

Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Proceedings of the iEEE conference on computer vision and pattern recognition, pp 1335–1344

Ma C, Yang C, Yang F, Zhuang Y, Zhang Z, Jia H, Xie X (2018) Trajectory factory: Tracklet cleaving and re-connection by deep siamese bi-gru for multiple object tracking. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6

Fernando T, Denman S, Sridharan S, Fookes C (2018) Task specific visual saliency prediction with memory augmented conditional generative adversarial networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 1539–1548

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2014) Deep autoregressive networks. In: International Conference on Machine Learning. PMLR, pp 1242–1250

Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 466–475

Fernando T, Denman S, Sridharan S, Fookes C (2018) Tracking by prediction: A deep generative model for mutli-person localisation and tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 1122–1132

Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision, pp 300–311

Kim C, Li F, Rehg J M (2018) Multi-object tracking with neural gating using bilinear lstm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 200–215

Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6951–6960

Tang S, Andres B, Andriluka M, Schiele B (2016) Multi-person tracking by multicut and deep matching. In: European Conference on Computer Vision. Springer, pp 100–111

Li W, Zhao R, Xiao T, Wang X (2014) Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 152–159

Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision. Springer, pp 868–884

Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942

Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. arXiv:1603.00831

Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 4126–4134

Bodla N, Singh B, Chellappa R, Davis L S (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569

Sun S, Akhtar N, Song H, Mian A S, Shah M (2019) Deep affinity network for multiple object tracking. IEEE transactions on pattern analysis and machine intelligence

Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D (2018) Multiobject tracking by submodular optimization. IEEE Trans Cybern 49(6):1990–2001

Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–6

Pirsiavash H, Ramanan D, Fowlkes C C (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR 2011. IEEE, pp 1201–1208

Andriyenko A, Schindler K, Roth S (2012) Discrete-continuous optimization for multi-target tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1926–1933

Wen L, Li W, Yan J, Lei Z, Yi D, Li S Z (2014) Multiple target tracking based on undirected hierarchical relation hypergraph. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1282–1289

Dicle C, Camps O I, Sznaier M (2013) The way they move: Tracking multiple targets with similar appearance. In: Proceedings of the IEEE international conference on computer vision, pp 2304–2311

Andriyenko A, Schindler K (2011) Multi-target tracking by continuous energy minimization. In: CVPR, vol 2, pp 7

Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE, pp 3464– 3468

He R, Wu X, Sun Z, Tan T (2018) Wasserstein cnn: Learning invariant features for nir-vis face recognition. IEEE Trans Pattern Anal Mach Intell 41(7):1761–1773

Saberian M J, Vasconcelos N (2012) Learning optimal embedded cascades. IEEE Trans Pattern Anal Mach Intell 34(10):2005–2018

Datondji S R E, Dupuis Y, Subirats P, Vasseur P (2016) A survey of vision-based traffic monitoring of road intersections. IEEE Trans Intell Transp Syst 17(10):2681–2698

Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415

Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 117:11–28

Shivakumara P, Tang D, Asadzadehkaljahi M, Lu T, Pal U, Anisi M H (2018) Cnn-rnn based method for license plate recognition. CAAI Trans Intell Technol 3(3):169–175

Sarfraz M, Ahmed M J (2019) An approach to license plate recognition system using neural network. In: Exploring Critical Approaches of Evolutionary Computation. IGI Global, pp 20–36

Nair A S, Raju S, Harikrishnan KJ, Mathew A (2018) A survey of techniques for license plate detection and recognition. i-manager’s J Image Process 5(1):25

Banerjee K, Notz D, Windelen J, Gavarraju S, He M (2018) Online camera lidar fusion and object detection on hybrid data for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, pp 1632–1638

Arnold E, Al-Jarrah O Y, Dianati M, Fallah S, Oxtoby D, Mouzakitis A (2019) A survey on 3d object detection methods for autonomous driving applications. IEEE Trans Intell Transp Syst 20 (10):3782–3795

Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z (2019) Clu-cnns: Object detection for medical images. Neurocomputing 350:53–59

Lu W, Zhou Y, Wan G, Hou S, Song S (2019) L3-net: Towards learning based lidar localization for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6389–6398

Altaf F, Islam Syed MS, Akhtar N, Janjua N K (2019) Going deep in medical image analysis: Concepts, methods, challenges, and future directions. IEEE Access 7:99540–99572

Naji S, Jalab H A, Kareem S A (2019) A survey on skin detection in colored images. Artif Intell Rev 52(2):1041–1087

Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6077–6086

Friedman S, Stamos I (2013) Online detection of repeated structures in point clouds of urban scenes for compression and registration. Int J Comput Vis 102(1-3):112–128

Bai S, An S (2018) A survey on automatic image caption generation. Neurocomputing 311:291–304

Yang W, Tan R T, Feng J, Guo Z, Yan S, Liu J (2019) Joint rain detection and removal from a single image with contextualized deep networks. IEEE Trans Pattern Anal Mach Intell 42(6):1377–1393

Sen D, Pal S K (2008) Generalized rough sets, entropy, and image ambiguity measures. IEEE Trans Syst Man Cybern Part B (Cybern) 39(1):117–128

Ganivada A, Ray S S, Pal S K (2012) Fuzzy rough granular self-organizing map and fuzzy rough entropy. Theor Comput Sci 466:37–63

Pal S K, Mitra S (1992) Multi-layer perceptron, fuzzy sets and classification. IEEE Trans Neural Netw 3(5):683–697

Mitra S, Pal S K (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63

Sen D, Pal SK (2010) Gradient histogram: thresholding in a region of interest for edge detection. Image Vis Comput 28(4):677–695

Pramanik A, Pal SK, Maiti J, Mitra P (2021) Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Transactions on Emerging Topics in Computational Intelligence. https://doi.org/10.1109/TETCI.2020.3041019

Pal SK, Banerjee R, Dutta S, Sarma SS (2013) An insight into the Z-number approach to CWW. Fundamenta Informaticae 124(1–2):197–229

Banerjee R, Pal SK (2015) Z*-numbers: augmented Z-numbers for machine-subjectivity representation. Inform Sci 323:143–178

Pal SK, Mandal DP (1992) Linguistic recognition system based on approximate reasoning. Inform Sci 61(1–2):135–161