Diverse receptive field network with context aggregation for fast object detection

Shaorong Xie1, Chang Liu1, Jiantao Gao1, Xiaomao Li1, Jun Luo1, Baojie Fan2, Jiahong Chen1, Huayan Pu1, Yan Peng1
1Shanghai University, No. 99 Shangda Road, Shanghai 200000, China
2Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Nanjing 210000, China

Tài liệu tham khảo

Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, Adv. Neural Informat. Process. Syst., 1097 K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014). Szegedy, 2015, Going deeper with convolutions, 1 K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016a, pp. 770–778. K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: European Conference on Computer Vision, Springer, 2016b, pp. 630–645. Xie, 2017, Aggregated residual transformations for deep neural networks, 1492 Ren, 2015, Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Informat. Process. Syst., 91 Liu, 2016, Ssd: Single shot multibox detector, 21 J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767, 2018. T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2017a, pp. 2980–2988. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017b, pp. 2117–2125. S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, M. Hebert, An empirical study of context in object detection, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1271–1278. Mottaghi, 2014, The role of context for object detection and semantic segmentation in the wild, 891 Li, 2016, Attentive contexts for object detection, IEEE Trans. Multimedia, 19, 944, 10.1109/TMM.2016.2642789 Zeng, 2017, Crafting gbd-net for object detection, IEEE Trans. Pattern Anal. Machine Intell., 40, 2109, 10.1109/TPAMI.2017.2745563 Gidaris, 2015, Object detection via a multi-region and semantic segmentation-aware cnn model, 1134 Zhu, 2017, Couplenet: Coupling global structure with local parts for object detection, 4126 Chu, 2018, Deep feature based contextual model for object detection, Neurocomputing, 275, 1035, 10.1016/j.neucom.2017.09.048 Nguyen, 2019, You always look again: Learning to detect the unseen objects, J. Vis. Commun. Image Represent., 60, 206, 10.1016/j.jvcir.2019.02.020 Li, 2017, Rotation-insensitive and context-augmented object detection in remote sensing images, IEEE Trans. Geosci. Remote Sens., 56, 2337, 10.1109/TGRS.2017.2778300 Chen, 2018, Context refinement for object detection, 71 S. Gupta, B. Hariharan, J. Malik, Exploring person context and local scene context for object detection, arXiv preprint arXiv:1511.08177, 2015. Shrivastava, 2016, Contextual priming and feedback for faster r-cnn, 330 Ouyang, 2016, Deepid-net: Object detection with deformable part based convolutional neural networks, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1320, 10.1109/TPAMI.2016.2587642 Parikh, 2011, Exploring tiny images: The roles of appearance and contextual information for machine and human object recognition, IEEE Trans. Pattern Anal. Mach. Intell., 34, 1978, 10.1109/TPAMI.2011.276 Zhu, 2015, Object recognition via contextual color attention, J. Vis. Commun. Image Represent., 27, 44, 10.1016/j.jvcir.2015.01.003 Ren, 2018, Context-assisted 3d (c3d) object detection from rgb-d images, J. Vis. Commun. Image Represent., 55, 131, 10.1016/j.jvcir.2018.05.019 Chen, 2017, Spatial memory for context reasoning in object detection, 4086 Hu, 2018, Relation networks for object detection, 3588 Liu, 2018, Structure inference net: Object detection using scene-level context and instance-level relationships, 6985 Chen, 2017, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Trans. Pattern Anal. Machine Intell., 40, 834, 10.1109/TPAMI.2017.2699184 Dai, 2016, R-fcn: Object detection via region-based fully convolutional networks, Adv. Neural Informat. Process. Syst., 379 Girshick, 2014, Rich feature hierarchies for accurate object detection and semantic segmentation, 580 R. Girshick, Fast r-cnn, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448. C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A.C. Berg, Dssd: Deconvolutional single shot detector, arXiv preprint arXiv:1701.06659, 2017. Redmon, 2016, You only look once: Unified, real-time object detection, 779 J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263–7271. Zhang, 2018, Single-shot refinement neural network for object detection, 4203 Uijlings, 2013, Selective search for object recognition, Int. J. Comput. Vision, 104, 154, 10.1007/s11263-013-0620-5 Kong, 2016, Hypernet: Towards accurate region proposal generation and joint object detection, 845 Cai, 2018, Cascade r-cnn: Delving into high quality object detection, 6154 Shrivastava, 2016, Training region-based object detectors with online hard example mining, 761 Wang, 2017, A-fast-rcnn: Hard positive generation via adversary for object detection, 2606 Cheng, 2018, Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection, IEEE Trans. Image Process., 28, 265, 10.1109/TIP.2018.2867198 Cheng, 2016, Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images, IEEE Trans. Geosci. Remote Sens., 54, 7405, 10.1109/TGRS.2016.2601622 Cai, 2016, An analysis of scale invariance in object detection snip, 354 Singh, 2018, An analysis of scale invariance in object detection snip, 3578 Kong, 2017, Ron: Reverse connection with objectness prior networks for object detection, 5936 Law, 2018, Cornernet: Detecting objects as paired keypoints, 734 He, 2015, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Machine Intell., 37, 1904, 10.1109/TPAMI.2015.2389824 Russakovsky, 2015, Imagenet large scale visual recognition challenge, Int. J. Comput. Vision, 115, 211, 10.1007/s11263-015-0816-y Glorot, 2010, Understanding the difficulty of training deep feedforward neural networks, 249 D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014. Everingham, 2010, The pascal visual object classes (voc) challenge, Int. J. Comput. Vision, 88, 303, 10.1007/s11263-009-0275-4 Lin, 2014, Microsoft coco: Common objects in context, 740 A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, 2017.