Recent advances in convolutional neural networks
Tài liệu tham khảo
Hubel, 1968, Receptive fields and functional architecture of monkey striate cortex, J. Physiol., 215, 10.1113/jphysiol.1968.sp008455
Fukushima, 1982, Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition, 267
Le Cun, 1989, Handwritten digit recognition with a back-propagation network, 396
LeCun, 1998, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278, 10.1109/5.726791
Hecht-Nielsen, 1988, Theory of the backpropagation neural network, Neural Networks, 1, 445, 10.1016/0893-6080(88)90469-8
Zhang, 1990, Parallel distributed processing model with local space-invariant interconnections and its optical architecture, Appl. Opt., 29, 4790, 10.1364/AO.29.004790
Niu, 2012, A novel hybrid CNN–SVM classifier for recognizing handwritten digits, Pattern Recognit., 45, 1318, 10.1016/j.patcog.2011.09.021
Russakovsky, 2015, Imagenet large scale visual recognition challenge, Int. J. Conflict Violence (IJCV), 115, 211
Simonyan, 2015, Very deep convolutional networks for large-scale image recognition
Szegedy, 2015, Going deeper with convolutions, 1
Zeiler, 2014, Visualizing and understanding convolutional networks, 818
He, 2016, Deep residual learning for image recognition, 770
LeCun, 2012, Efficient backprop, 9
Nair, 2010, Rectified linear units improve restricted Boltzmann machines, 807
Wang, 2012, End-to-end text recognition with convolutional neural networks, 3304
Boureau, 2010, A theoretical analysis of feature pooling in visual recognition, 111
Hinton, 2012, Improving neural networks by preventing co-adaptation of feature detectors, CoRR abs/1207.0580
Lin, 2014, Network in network
Tang, 2013, Deep learning using linear support vector machines
Madjarov, 2012, An extensive experimental comparison of methods for multi-label learning, Pattern Recognit., 45, 3084, 10.1016/j.patcog.2012.03.004
Wijnhoven, 2010, Fast training of object detection using stochastic gradient descent, 424
Zinkevich, 2010, Parallelized stochastic gradient descent, 2595
Ngiam, 2010, Tiled convolutional neural networks, 1279
Wang, 2015, Encoding time series as images for visual inspection and classification using tiled convolutional neural networks
Zheng, 2014, Time series classification using multi-channels deep convolutional neural networks, 298
Zeiler, 2010, Deconvolutional networks, 2528
Zeiler, 2011, Adaptive deconvolutional networks for mid and high level feature learning, 2018
Long, 2017, Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal.Mach.Intell. (PAMI), 39, 640, 10.1109/TPAMI.2016.2572683
Visin, 2015, Reseg: a recurrent neural network for object segmentation
Noh, 2015, Learning deconvolution network for semantic segmentation, 1520
Cao, 2015, Look and think twice: capturing top-down visual attention with feedback convolutional neural networks, 2956
Zhang, 2016, Top-down neural attention by excitation backprop, 543
Zhang, 2016, Augmenting supervised neural networks with unsupervised objectives for large-scale image classification, 612
Zhou, 2016, Learning deep features for discriminative localization, 2921
Das, 2016, Human attention in visual question answering: Do humans and deep networks look at the same regions?, 932
Dong, 2016, Image super-resolution using deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 38, 295, 10.1109/TPAMI.2015.2439281
Yu, 2016, Multi-scale context aggregation by dilated convolutions
Kalchbrenner, 2016, Neural machine translation in linear time, CoRR abs/1610.10099
Oord, 2016, Wavenet: a generative model for raw audio, CoRR abs/1609.03499
Sercu, 2016, Dense prediction on sequences with time-dilated convolutions for speech recognition
Szegedy, 2016, Rethinking the inception architecture for computer vision, 2818
Szegedy, 2017, Inception-v4, inception-resnet and the impact of residual connections on learning, 4278
Hyvärinen, 2007, Complex cell pooling and the statistics of natural images, Network, 18, 81, 10.1080/09548980701418942
Estrach, 2014, Signal recovery from pooling representations, 307
Wan, 2013, Regularization of neural networks using dropconnect, 1058
Yu, 2014, Mixed pooling for convolutional neural networks, 364
Zeiler, 2013, Stochastic pooling for regularization of deep convolutional neural networks
Rippel, 2015, Spectral representations for convolutional neural networks, 2449
Mathieu, 2014, Fast training of convolutional networks through FFTs
He, 2015, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 37, 1904, 10.1109/TPAMI.2015.2389824
Singh, 2012, Unsupervised discovery of mid-level discriminative patches, 73
Gong, 2014, Multi-scale orderless pooling of deep convolutional activation features, 392
Jégou, 2012, Aggregating local image descriptors into compact codes, IEEE Trans. Pattern Anal.Mach.Intell. (PAMI), 34, 1704, 10.1109/TPAMI.2011.235
Maas, 2013, Rectifier nonlinearities improve neural network acoustic models, 30
Zeiler, 2013, On rectified linear units for speech processing, 3517
He, 2015, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, 1026
Xu, 2015, Empirical evaluation of rectified activations in convolutional network
Clevert, 2016, Fast and accurate deep network learning by exponential linear units (elus)
Goodfellow, 2013, Maxout networks, 1319
Springenberg, 2013, Improving deep neural networks with probabilistic maxout units, CoRR abs/1312.6116
Zhang, 2004, Solving large scale linear prediction problems using stochastic gradient descent algorithms
Deng, 2012, The mnist database of handwritten digit images for machine learning research, IEEE Signal Process. Mag., 29, 141, 10.1109/MSP.2012.2211477
Liu, 2016, Large-margin softmax loss for convolutional neural networks, 507
Bromley, 1993, Signature verification using a siamese time delay neural network, Int. J. Pattern Recognit. Artif. Intell. (IJPRAI), 7, 669, 10.1142/S0218001493000339
Chopra, 2005, Learning a similarity metric discriminatively, with application to face verification, 539
Hadsell, 2006, Dimensionality reduction by learning an invariant mapping, 1735
Shaham, 2017, Learning by coincidence: siamese networks and common variable learning, Pattern Recognit.
Lin, 2017, Deephash: getting regularization, depth and fine-tuning right, 133
Schroff, 2015, Facenet: a unified embedding for face recognition and clustering, 815
Liu, 2016, Deep relative distance learning: tell the difference between similar vehicles, 2167
Ding, 2015, Deep feature learning with relative distance comparison for person re-identification, Pattern Recognit., 48, 2993, 10.1016/j.patcog.2015.04.005
Liu, 2016, Deepfashion: powering robust clothes recognition and retrieval with rich annotations, 1096
Kingma, 2014, Auto-encoding variational bayes
Im, 2017, Denoising criterion for variational auto-encoding framework, 2059
Kingma, 2014, Semi-supervised learning with deep generative models, 3581
Goodfellow, 2014, Generative adversarial nets, 2672
Mirza, 2014, Conditional generative adversarial nets, CoRR abs/1411.1784
Vincent, 2008, Extracting and composing robust features with denoising autoencoders, 1096
Ng, 2016, Dual autoencoders features for imbalance classification problem, Pattern Recognit., 60, 875, 10.1016/j.patcog.2016.06.013
Mehta, 2017, Rodeo: robust de-aliasing autoencoder for real-time medical image reconstruction, Pattern Recognit., 63, 499, 10.1016/j.patcog.2016.09.022
Olshausen, 1996, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, 381, 607, 10.1038/381607a0
Lee, 2006, Efficient sparse coding algorithms, 801
Eslami, 2016, Attend, infer, repeat: fast scene understanding with generative models, 3225
Sohn, 2015, Learning structured output representation using deep conditional generative models, 3483
Reed, 2016, Generative adversarial text to image synthesis, 1060
Denton, 2015, Deep generative image models using a Laplacian pyramid of adversarial networks, 1486
Salimans, 2016, Improved techniques for training GANs, 2226
Dosovitskiy, 2016, Generating images with perceptual similarity metrics based on deep networks, 658
Tikhonov, 1943, On the stability of inverse problems, 39, 195
Wang, 2013, Fast dropout training, 118
Ba, 2013, Adaptive dropout for training deep neural networks, 3084
Tompson, 2015, Efficient object localization using convolutional networks, 648
Yang, 2015, Mirror, mirror on the wall, tell me, is the error small?, 4685
Xie, 2015, Holistically-nested edge detection, 1395
Salamon, 2017, Deep convolutional neural networks and data augmentation for environmental sound classification, Signal Process. Lett. (SPL), 24, 279, 10.1109/LSP.2017.2657381
Eigen, 2015, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, 2650
Paulin, 2014, Transformation pursuit for image classification, 3646
Hauberg, 2016, Dreaming more data: Class-dependent distributions over diffeomorphisms for learned data augmentation, 342
Xie, 2015, Hyper-class augmented and regularized deep learning for fine-grained image classification, 2645
Xu, 2015, Augmenting strong supervision using web data for fine-grained categorization, 2524
Choromanska, 2015, The loss surfaces of multilayer networks
Mishkin, 2016, All you need is a good init
Sutskever, 2013, On the importance of initialization and momentum in deep learning, 1139
Glorot, 2010, Understanding the difficulty of training deep feedforward neural networks, 249
Saxe, 2014, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Doersch, 2015, Unsupervised visual representation learning by context prediction, 1422
Agrawal, 2015, Learning to see by moving, 37
Qian, 1999, On the momentum term in gradient descent learning algorithms, Neural Netw., 12, 145, 10.1016/S0893-6080(98)00116-6
Kingma, 2015, Adam: A method for stochastic optimization
Loshchilov, 2017, Sgdr: Stochastic gradient descent with warm restarts
Schaul, 2013, No more pesky learning rates, 343
Zhang, 2015, Deep learning with elastic averaging SGD, 685
Recht, 2011, Hogwild: a lock-free approach to parallelizing stochastic gradient descent, 693
Dean, 2012, Large scale distributed deep networks, 1232
Paine, 2011, GPU asynchronous stochastic gradient descent to speed up neural network training, CoRR
Zhuang, 2013, A fast parallel SGD for matrix factorization in shared memory systems, 249
Yao, 2007, On early stopping in gradient descent learning, Constructive Approx., 26, 289, 10.1007/s00365-006-0663-2
Prechelt, 2012, Early stopping - but when?, 53
Zhang, 2017, Understanding deep learning requires rethinking generalization
Ioffe, 2015, Batch normalization: accelerating deep network training by reducing internal covariate shift, J. Mach. Learn. Res. (JMLR), 448
Hochreiter, 1997, Long short-term memory, Neural Comput., 9, 1735, 10.1162/neco.1997.9.8.1735
Srivastava, 2015, Training very deep networks, 2377
He, 2016, Identity mappings in deep residual networks, 630
Shen, 2016, Weighted residuals for very deep networks, 936
Zagoruyko, 2016, Wide residual networks, 87.1
Singh, 2016, Swapout: learning an ensemble of deep architectures, 28
Targ, 2016, Resnet in resnet: generalizing residual architectures, CoRR
Zhang, 2016, Residual networks of residual networks: multilevel residual networks, IEEE Trans. Circuits Syst. Video Technol. (TCSVT), PP, 1
Huang, 2016, Densely connected convolutional networks, 4700
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, Cudnn: efficient primitives for deep learningabs/1410.0759 (2014).
Vasilache, 2015, Fast convolutional nets with fbfft: aGPU performance evaluation
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: integrated recognition, localization and detection using convolutional networks (2014).
Lavin, 2016, Fast algorithms for convolutional neural networks, 4013
Sainath, 2013, Low-rank matrix factorization for deep neural network training with high-dimensional output targets, 6655
Xue, 2013, Restructuring of deep neural network acoustic models with singular value decomposition, 2365
Denil, 2013, Predicting parameters in deep learning, 2148
Denton, 2014, Exploiting linear structure within convolutional networks for efficient evaluation, 1269
Jaderberg, 2014, Speeding up convolutional neural networks with low rank expansions
Novikov, 2015, Tensorizing neural networks, 442
Oseledets, 2011, Tensor-train decomposition, SIAM J. Sci. Comput., 33, 2295, 10.1137/090752286
Le, 2013, Fastfood-approximating kernel expansions in loglinear time, 85
Dasgupta, 2011, Fast locality-sensitive hashing, 1073
Yu, 2014, Circulant binary embedding, 946
Cheng, 2015, An exploration of parameter redundancy in deep networks with circulant projections, 2857
Moczulski, 2016, Acdc: a structured efficient linear layer
Han, 2016, Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding
Kim, 2016, Bitwise neural networks
Rastegari, 2016, Xnor-net: imagenet classification using binary convolutional neural networks, 525
Zhou, 2016, Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients, CoRR
Courbariaux, 2016, Binarynet: training deep neural networks with weights and activations constrained to+ 1 or-1
Sullivan, 1996, Efficient scalar quantization of exponential and Laplacian random variables, IEEE Trans. Inf. Theory, 42, 1365, 10.1109/18.532878
Y. Gong, L. Liu, M. Yang, L. Bourdev, Compressing deep convolutional networks using vector quantization, in: arXiv preprint arXiv:1412.6115, volume abs/1412.6115, 2014.
Chen, 2010, Approximate nearest neighbor search by residual vector quantization, Sensors, 10, 11259, 10.3390/s101211259
Zhou, 2012, Scalar quantization for large scale image search, 169
Pratt, 1988, Comparing biases for minimal network construction with back-propagation, 177
Han, 2015, Learning both weights and connections for efficient neural network, 1135
Guo, 2016, Dynamic network surgery for efficient DNNs, 1379
Yang, 2016, Designing energy-efficient convolutional neural networks using energy-aware pruning, CoRR abs/1611.05128
H. Hu, R. Peng, Y.-W. Tai, C.-K. Tang, Network trimming: a data-driven neuron pruning approach towards efficient deep architectures, volume abs/1607.03250, 2016.
Srinivas, 2015, Data-free parameter pruning for deep neural networks
Mariet, 2015, Diversity networks
Chen, 2015, Compressing neural networks with the hashing trick, 2285
Shi, 2009, Hash kernels for structured data, J. Mach. Learn. Res. (JMLR), 10, 2615
Weinberger, 2009, Feature hashing for large scale multitask learning, 1113
Liu, 2015, Sparse convolutional neural networks, 806
Wen, 2016, Learning structured sparsity in deep neural networks, 2074
Bagherinezhad, 2017, Lcnn: Lookup-based convolutional neural network
Egmont-Petersen, 2002, Image processing with neural networksa review, Pattern Recognit., 35, 2279, 10.1016/S0031-3203(01)00178-9
Nogueira, 2017, Towards better exploiting convolutional neural networks for remote sensing scene classification, Pattern Recognit., 61, 539, 10.1016/j.patcog.2016.07.001
Zuo, 2015, Exemplar based deep discriminative and shareable feature learning for scene image classification, Pattern Recognit., 48, 3004, 10.1016/j.patcog.2015.02.003
Lopes, 2017, Facial expression recognition with convolutional neural networks: coping with few data and the training sample order, Pattern Recognit., 61, 610, 10.1016/j.patcog.2016.07.026
Everingham, 2015, The pascal visual object classes challenge: a retrospective, Int. J. Conflict Violence (IJCV), 111, 98
Tousch, 2012, Semantic hierarchies for image annotation: a survey, Pattern Recognit., 45, 333, 10.1016/j.patcog.2011.05.017
Srivastava, 2013, Discriminative transfer learning with tree-based priors, 2094
Wang, 2015, Learning fine-grained features via a CNN tree for large-scale classification, CoRR abs/1511.04534
Xiao, 2014, Error-driven incremental learning in deep convolutional neural network for large-scale image classification, 177
Z. Yan, V. Jagadeesh, D. DeCoste, W. Di, R. Piramuthu, Hd-cnn: hierarchical deep convolutional neural network for image classification, in: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2740–2748.
Berg, 2014, Birdsnap: large-scale fine-grained visual categorization of birds, 2019
Khosla, 2011, Novel dataset for fine-grained image categorization: stanford dogs, 2, 1
Yang, 2015, A large-scale car dataset for fine-grained categorization and verification, 3973
Minervini, 2016, Finely-grained annotated datasets for image-based plant phenotyping, Pattern Recognit. Lett., 81, 80, 10.1016/j.patrec.2015.10.013
Xie, 2017, Lg-cnn: from local parts to global discrimination for fine-grained recognition, Pattern Recognit., 71, 118, 10.1016/j.patcog.2017.06.002
Branson, 2014, Improved bird species recognition using pose normalized deep convolutional nets
Zhang, 2014, Part-based r-cnns for fine-grained category detection, 834
Uijlings, 2013, Selective search for object recognition, Int. J. Conflict Violence (IJCV), 104, 154
Lin, 2015, Deep lac: deep localization, alignment and classification for fine-grained recognition, 1666
Pluim, 2003, Mutual-information-based registration of medical images: a survey, IEEE Trans. Med. Imaging, 22, 986, 10.1109/TMI.2003.815867
Krause, 2014, Learning features and parts for fine-grained recognition, 26
Krause, 2015, Fine-grained recognition without part annotations, 5546
Zhang, 2016, Weakly supervised fine-grained categorization with part-based image representation, IEEE Trans. Image Process., 25, 1713, 10.1109/TIP.2016.2531289
Xiao, 2015, The application of two-level attention models in deep convolutional neural network for fine-grained image classification, 842
Lin, 2015, Bilinear CNN models for fine-grained visual recognition, 1449
Nguyen, 2016, Human detection from images and videos: a survey, Pattern Recognit., 51, 148, 10.1016/j.patcog.2015.08.027
Li, 2015, Feature representation for statistical-learning-based object detection: a review, Pattern Recognit., 48, 3542, 10.1016/j.patcog.2015.04.018
Pedersoli, 2015, A coarse-to-fine approach for fast deformable object detection, Pattern Recognit., 48, 1844, 10.1016/j.patcog.2014.11.006
Nowlan, 1994, A convolutional neural network hand tracker, 901
Girshick, 2015, Deformable part models are convolutional neural networks, 437
Vaillant, 1994, Original approach for the localisation of objects in images, IEE Proc.-Vis. Image Signal Process., 141, 245, 10.1049/ip-vis:19941301
Lin, 2014, Microsoft Coco: Common Objects in Context, 740
Endres, 2014, Category independent object proposals, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 36, 222, 10.1109/TPAMI.2013.122
Gómez, 2017, Textproposals: a text-specific selective search algorithm for word spotting in the wild, Pattern Recognit., 70, 60, 10.1016/j.patcog.2017.04.027
Girshick, 2014, Rich feature hierarchies for accurate object detection and semantic segmentation, 580
He, 2015, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 37, 1904, 10.1109/TPAMI.2015.2389824
Ren, 2017, Faster r-CNN: towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 39, 1137, 10.1109/TPAMI.2016.2577031
Gidaris, 2015, Object detection via a multi-region and semantic segmentation-aware cnn model, 1134
Yoo, 2015, Attentionnet: Aggregating weak directions for accurate object detection, 2659
Felzenszwalb, 2010, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 32, 1627, 10.1109/TPAMI.2009.167
Simo-Serra, 2014, Fracking deep convolutional image descriptors, CoRR abs/1412.6537
Shrivastava, 2016, Training region-based object detectors with online hard example mining, 761
Redmon, 2016, You only look once: unified, real-time object detection, 779
Liu, 2016, SSD: single shot multibox detector, 21
Lu, 2016, Adaptive object detection using adjacency and zoom prediction, 2351
Zhang, 2013, Real-time visual tracking via online weighted multiple instance learning, Pattern Recognit., 46, 397, 10.1016/j.patcog.2012.07.013
Zhang, 2013, Sparse coding based visual tracking: review and experimental comparison, Pattern Recognit., 46, 1772, 10.1016/j.patcog.2012.10.006
Zhang, 2015, Multi-target tracking by learning local-to-global trajectory models, Pattern Recognit., 48, 580, 10.1016/j.patcog.2014.08.013
Fan, 2010, Human tracking using convolutional neural networks, IEEE Trans. Neural Netw. (TNN), 21, 1610, 10.1109/TNN.2010.2066286
Li, 2014, Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking
Chen, 2016, Cnntracker: online discriminative object tracking via deep convolutional neural network, Appl. Soft Comput., 38, 1088, 10.1016/j.asoc.2015.06.048
Hong, 2015, Online tracking by learning discriminative saliency map with convolutional neural network, 597
Patacchiola, 2017, Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods, Pattern Recognit., 71, 132, 10.1016/j.patcog.2017.06.009
Nishi, 2017, Generation of human depth images with body part labels for complex human pose recognition, Pattern Recognit., 10.1016/j.patcog.2017.06.006
Toshev, 2014, Deeppose: human pose estimation via deep neural networks, 1653
Jain, 2014, Learning human pose estimation features with convolutional networks
Tompson, 2014, Joint training of a convolutional network and a graphical model for human pose estimation, 1799
X. Chen, A.L. Yuille, Articulated pose estimation by a graphical model with image dependent pairwise relations, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2014, pp. 1736–1744.
Chen, 2015, Parsing occluded people by flexible compositions, 3945
Fan, 2015, Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation, 1347
Jain, 2014, Modeep: A deep learning framework using motion features for human pose estimation, 302
Tang, 1996, Automatic document processing: a survey, Pattern Recognit., 29, 1931, 10.1016/S0031-3203(96)00044-1
Vinciarelli, 2002, A survey on off-line cursive word recognition, Pattern Recognit., 35, 1433, 10.1016/S0031-3203(01)00129-7
Jung, 2004, Text information extraction in images and video: a survey, Pattern Recognit., 37, 977, 10.1016/j.patcog.2003.10.012
Eskenazi, 2017, A comprehensive survey of mostly textual document segmentation algorithms since 2008, Pattern Recognit., 64, 1, 10.1016/j.patcog.2016.10.023
Bai, 2017, Text/non-text image classification in the wild with convolutional neural networks, Pattern Recognit., 66, 437, 10.1016/j.patcog.2016.12.005
Gomez, 2017, Improving patch-based scene text script identification with ensembles of conjoined networks, Pattern Recognit., 67, 85, 10.1016/j.patcog.2017.01.032
Delakis, 2008, Text detection with convolutional neural networks, 290
Xu, 2015, Robust seed localization and growing with deep convolutional features for scene text detection, 387
Huang, 2014, Robust scene text detection with convolution neural network induced mser trees, 497
Zhang, 2015, Automatic discrimination of text and non-text natural images, 886
Goodfellow, 2014, Multi-digit number recognition from street view imagery using deep convolutional neural networks
Jaderberg, 2015, Deep structured output learning for unconstrained text recognition
He, 2016, Reading scene text in deep convolutional sequences, 3501
Gers, 2000, Learning to forget: continual prediction with lstm, Neural Comput., 12, 2451, 10.1162/089976600300015015
Shi, 2015, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, CoRR abs/1507.05717
Jaderberg, 2014, Deep features for text spotting, 512
M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Reading text in the wild with convolutional neural networks, volume 116, 2016, pp. 1–20.
Wang, 2015, Deep networks for saliency detection via local estimation and global search, 3183
Zhao, 2015, Saliency detection by multi-context deep learning, 1265
Li, 2015, Visual saliency based on multiscale deep features, 5455
Liu, 2015, Predicting eye fixations using convolutional neural networks, 362
He, 2015, Supercnn: a superpixelwise convolutional neural network for salient object detection, Inter. J. Comput. Vis., 115, 330, 10.1007/s11263-015-0822-0
Vig, 2014, Large-scale optimization of hierarchical features for saliency prediction in natural images, 2798
Kümmerer, 2015, Deep gaze i: boosting saliency prediction with feature maps trained on imagenet
Pan, 2015, End-to-end convolutional network for saliency prediction, CoRR abs/1507.01422
Guo, 2014, A survey on still image based human action recognition, Pattern Recognit., 47, 3343, 10.1016/j.patcog.2014.04.018
Presti, 2016, 3D skeleton-based human action classification: a survey, Pattern Recognit., 53, 130, 10.1016/j.patcog.2015.11.019
Zhang, 2016, Rgb-d-based action recognition datasets: a survey, Pattern Recognit., 60, 86, 10.1016/j.patcog.2016.05.019
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, Decaf: a deep convolutional activation feature for generic visual recognition, 2014.
Oquab, 2014, Learning and transferring mid-level image representations using convolutional neural networks, 1717
Gkioxari, 2015, Actions and attributes from wholes and parts, 2470
Pishchulin, 2013, Poselet conditioned pictorial structures, 588
Gkioxari, 2015, Contextual action recognition with r*CNN, 1080
Gkioxari, 2015, Actions and attributes from wholes and parts, 2470
Zhang, 2016, Action recognition in still images with minimum annotation efforts, IEEE Trans. Image Process., 25, 5479, 10.1109/TIP.2016.2605305
Wang, 2017, Three-stream CNNs for action recognition, Pattern Recognit. Lett., 92, 33, 10.1016/j.patrec.2017.04.004
Ji, 2013, 3D convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 35, 221, 10.1109/TPAMI.2012.59
Tran, 2015, Learning spatiotemporal features with 3d convolutional networks, 4489
Karpathy, 2014, Large-scale video classification with convolutional neural networks, 1725
Simonyan, 2014, Two-stream convolutional networks for action recognition in videos, 568
Chéron, 2015, P-CNN: pose-based CNN features for action recognition, 3218
Donahue, 2017, Long-term recurrent convolutional networks for visual recognition and description, IEEE Trans. Pattern Anal.Mach.Intell. (PAMI), 39, 677, 10.1109/TPAMI.2016.2599174
Fu, 1981, A survey on image segmentation, Pattern Recognit., 13, 3, 10.1016/0031-3203(81)90028-5
Zhou, 2016, Multi-scale context for scene labeling via flexible segmentation graph, Pattern Recognit., 59, 312, 10.1016/j.patcog.2016.03.023
Liu, 2015, CRF learning with CNN features for image segmentation, Pattern Recognit., 48, 2983, 10.1016/j.patcog.2015.04.019
Bu, 2016, Scene parsing using inference embedded deep networks, Pattern Recognit., 59, 188, 10.1016/j.patcog.2016.01.027
Peng, 2013, A survey of graph theoretical approaches to image segmentation, Pattern Recognit., 46, 1020, 10.1016/j.patcog.2012.09.015
Farabet, 2013, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach.Intell. (PAMI), 35, 1915, 10.1109/TPAMI.2012.231
Couprie, 2013, Indoor semantic segmentation using depth information
Pinheiro, 2014, Recurrent convolutional neural networks for scene labeling, 82
Shuai, 2015, Integrating parametric and non-parametric models for scene labeling, 4249
B. Shuai, Z. Zuo, W. Gang, Quaddirectional 2d-recurrent neural networks for image labeling 22(11) (2015b) 1990–1994.
Shuai, 2016, Dag-recurrent neural networks for scene labeling, 3620
Mostajabi, 2015, Feedforward semantic segmentation with zoom-out features, 3376
Chen, 2015, Semantic image segmentation with deep convolutional nets and fully connected crfs
El Ayadi, 2011, Survey on speech emotion recognition: features, classification schemes, and databases, Pattern Recognit., 44, 572, 10.1016/j.patcog.2010.09.020
Deng, 1991, Phonemic hidden ,Markov models with continuous mixture output densities for large vocabulary word recognition, IEEE Trans. Signal Process., 39, 1677, 10.1109/78.134406
Hinton, 2012, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag., 29, 82, 10.1109/MSP.2012.2205597
Deng, 2013, Recent advances in deep learning for speech research at Microsoft, 8604
Yao, 2012, Adaptation of context-dependent deep neural networks for automatic speech recognition, 366
Abdel-Hamid, 2012, Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition, 4277
Abdel-Hamid, 2014, Convolutional neural networks for speech recognition
Palaz, 2013, Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, 1766
Hoshen, 2015, Speech acoustic modeling from raw multichannel waveforms, 4624
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, et al., Deep speech 2: End-to-end speech recognition in english and mandarin, 2016, pp. 173–182.
Sercu, 2016, Advances in very deep convolutional neural networks for LVCSR, 3429
Tóth, 2014, Convolutional deep maxout networks for phone recognition., 1078
Sainath, 2013, Improvements to deep convolutional neural networks for LVCSR, 315
Yu, 2016, Deep convolutional neural networks with layer-wise context expansion and attention, 17
Waibel, 1989, Phoneme recognition using time-delay neural networks, IEEE Trans. Acoustics, Speech, Signal Process., 37, 328, 10.1109/29.21701
Chen, 2014, Dnn-based stochastic postfilter for hmm-based speech synthesis., 1954
Uria, 2015, Modelling acoustic feature dependencies with artificial neural networks: Trajectory-rnade, 4465
Huang, 2017, Hierarchical bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation, Pattern Recognit. Lett., 10.1016/j.patrec.2017.08.001
van den Oord, 2016, Pixel recurrent neural networks, 1747
Jozefowicz, 2016, Exploring the limits of language modeling
Kim, 2016, Character-aware neural language models, 2741
J. Gu, C. Jianfei, G. Wang, T. Chen, Stack-captioning: coarse-to-fine learning for image captioning, volume abs/1709.03376, 2017.
Wang, 2015, gen cnn: a convolutional architecture for word sequence prediction, 1567
Gu, 2017, An empirical study of language CNN for image captioning
Yann N. Dauphin, 2017, Language modeling with gated convolutional networks, 933
Collobert, 2008, A unified architecture for natural language processing: deep neural networks with multitask learning, 160
Yu, 2014, Deep learning for answer sentence selection
Kalchbrenner, 2014, A convolutional neural network for modelling sentences, 655
Kim, 2014, Convolutional neural networks for sentence classification, 1746
Yin, 2015, Multichannel variable-size convolution for sentence classification, 204
Collobert, 2011, Natural language processing (almost) from scratch, J. Mach. Learn. Res. (JMLR), 12, 2493
Conneau, 2016, Very deep convolutional networks for natural language processing, CoRR abs/1606.01781
Huang, 2016, Deep networks with stochastic depth, 646