Representation Learning: A Review and New Perspectives

IEEE Transactions on Pattern Analysis and Machine Intelligence - Tập 35 Số 8 - Trang 1798-1828 - 2013
Yoshua Bengio1, Aaron Courville2, P. M. Durai Raj Vincent2
1Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec H3C 3J7, Canada.
2Dept. of Comput. Sci. & Oper. Res., Univ. de Montreal, Montreal, QC, Canada

Tóm tắt

Từ khóa


Tài liệu tham khảo

le, 2011, On Optimization Methods for Deep Learning, Proc Int'l Conf Machine Learning

le, 2011, ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning, Proc Neural Information and Processing Systems

10.1109/CVPR.2011.5995496

le roux, 2007, Learning the 2-D Topology of Images, Proc Neural Information and Processing Systems

le roux, 2007, Topmoumoute Online Natural Gradient Algorithm, Proc Neural Information and Processing Systems

10.1007/978-3-642-82657-3_24

lecun, 1987, Modeles Connexionnistes de l'Apprentissage

10.1109/CVPR.2010.5539957

zou, 2011, Unsupervised Learning of Visual Invariance with Temporal Coherence, Proc NIPS Workshop Deep Learning and Unsupervised Feature Learning

yu, 2010, Improved Local Coordinate Coding Using Local Tangents, Proc Int'l Conf Machine Learning

yu, 2009, Nonlinear Learning Using Local Coordinate Coding, Proc Neural Information and Processing Systems

10.1109/CVPR.2011.5995732

alain, 2012, What Regularized Auto-Encoders Learn from the Data Generating Distribution

yuille, 2004, The Convergence of Contrastive Divergences, Proc Neural Information and Processing Systems, 1593

10.1162/089976698300017746

10.4249/scholarpedia.3881

10.1214/12-STS394

10.1109/JSTSP.2010.2075990

bagnell, 2009, Differentiable Sparse Coding, Proc Neural Information and Processing Systems, 113

bengio, 2012, Deep Learning of Representations for Unsupervised and Transfer Learning, Proc JMLR Workshop Conf, 27, 17

younes, 1999, On the Convergence of Markovian Stochastic Algorithms with Rapidly Decreasing Ergodicity Rates, Stochastics and Stochastic Reports, 65, 177, 10.1080/17442509908834179

baird, 1990, Document Image Defect Models, IAPR Workshop on Structural and Syntactic Pattern Recognition, 38

10.1561/2200000006

10.1038/355161a0

10.1162/neco.2008.11-07-647

10.1162/089976603321780317

bengio, 2013, Neural Networks Tricks of the Trade

10.1016/S0042-6989(97)00121-1

bengio, 2007, Large Scale Kernel Machines

10.1142/S0218001493000327

bengio, 2011, On the Expressive Power of Deep Architectures, Proc Int'l Conf Algorithmic Learning Theory, 10.1007/978-3-642-24412-4_3

10.1109/72.279181

bengio, 2004, Non-Local Manifold Tangent Learning, Proc Neural Information and Processing Systems, 129

bengio, 2003, A Neural Probabilistic Language Model, J Machine Learning Research, 3, 137

10.1109/CVPR.2006.68

le, 2010, Tiled Convolutional Neural Networks, Proc Neural Information and Processing Systems

le, 2013, Structured Output Layer Neural Network Language Models for Speech Recognition, IEEE Trans Audio Speech and Language Processing, 21, 197, 10.1109/TASL.2012.2215599

10.1152/jn.00149.2003

krizhevsky, 2010, Convolutional Deep Belief Networks on CIFAR-10

kingma, 2010, Regularized Estimation of Image Statistics by Score Matching, Proc Neural Information and Processing Systems

kivinen, 2012, Multiple Texture Boltzmann Machines, Proc Conf Artificial Intelligence and Statistics

10.1145/1390156.1390224

larochelle, 2009, Exploring Strategies for Training Deep Neural Networks, J Machine Learning Research, 10, 1

10.1162/089976602317318938

krizhevsky, 2009, Learning Multiple Layers of Features from Tiny Images

krizhevsky, 2012, ImageNet Classification with Deep Convolutional Neural Networks, Proc Neural Information and Processing Systems

10.1145/1390156.1390303

10.1007/s10994-010-5198-3

welling, 2009, Herding Dynamic Weights for Partially Observed Random Field Models, Proc Conf Uncertainty in Artificial Intelligence

welling, 2002, Learning Sparse Topographic Representations with Products of Student-t Distributions, Proc Neural Information and Processing Systems

vincent, 2010, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, J Machine Learning Research, 11, 3371

10.1007/s11263-005-4939-z

vincent, 2002, Manifold Parzen Windows, Proc Neural Information and Processing Systems

10.1145/1390156.1390294

10.1162/NECO_a_00142

kavukcuoglu, 2010, Learning Convolutional Feature Hierarchies for Visual Recognition, Proc Neural Information and Processing Systems

10.1109/CVPR.2009.5206545

jain, 2008, Natural Image Denoising with Convolutional Networks, Proc Neural Information and Processing Systems

10.1109/ICCV.2009.5459469

jenatton, 2009, Structured Variable Selection with Sparsity-Inducing Norms

10.1016/0165-1684(91)90079-X

10.1002/0471221317

10.1162/089976601750264992

hyvrinen, 2009, Natural Image Statistics A Probabilistic Approach to Early Computational Vision

10.4249/scholarpedia.2330

10.1109/CVPR.2011.5995635

cadieu, 2009, Learning Transformational Invariants from Natural Movies, Proc Neural Information and Processing Systems, 209

kavukcuoglu, 2008, Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition

10.1109/ICCV.2011.6126555

boureau, 2010, A Theoretical Analysis of Feature Pooling in Vision Algorithms, Proc Int'l Conf Machine Learning

brand, 2002, Charting a Manifold, Proc Neural Information and Processing Systems, 961

10.1007/BF00332918

10.2307/2987782

10.1167/5.6.9

boulanger-lewandowski, 2012, Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription, Proc Int'l Conf Machine Learning

bordes, 2012, Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing, Proc Int'l Conf Artificial Intelligence and Statistics

10.1162/089976600300015312

10.1162/NECO_a_00158

bergstra, 2011, Algorithms for Hyper-Parameter Optimization, Proc Neural Information and Processing Systems

bengio, 2013, Better Mixing via Deep Representations, Proc Int'l Conf Machine Learning

bengio, 2012, Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders

10.1111/j.1467-8640.2010.00366.x

10.1145/1553374.1553380

bengio, 2006, Greedy Layer-Wise Training of Deep Networks, Proc Neural Information and Processing Systems

bengio, 2005, Non-Local Manifold Parzen Windows, Proc Neural Information and Processing Systems

bengio, 2005, The Curse of Highly Variable Functions for Local Kernel Machines, Proc Neural Information and Processing Systems

bengio, 2003, Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering, Proc Neural Information and Processing Systems

bergstra, 2012, Random Search for Hyper-Parameter Optimization, J Machine Learning Research, 13, 281

bergstra, 2009, Slow, Decorrelated Features for Pretraining Complex Cell-Like Networks, Proc Neural Information and Processing Systems

10.1016/j.neuron.2012.01.010

desjardins, 2012, On Training Deep Boltzmann Machines

eisner, 2012, Learning Approximate Inference Policies for Fast Prediction, Proc ICML Workshop Interactions between Search and Learning

donoho, 2003, Hessian Eigenmaps: New Locally Linear Embedding Techniques for High-Dimensional Data, 10.1073/pnas.1031596100

rifai, 2011, Higher Order Contractive Auto-Encoder, Proc European Conf Machine Learning and Knowledge Discovery in Databases, 10.1007/978-3-642-23783-6_41

rifai, 2011, Contractive Auto-Encoders: Explicit Invariance during Feature Extraction, Proc Int'l Conf Machine Learning

rifai, 2011, The Manifold Tangent Classifier, Proc Neural Information and Processing Systems

10.1109/CVPR.2010.5539962

ranzato, 2006, Efficient Learning of Sparse Representations with an Energy-Based Model, Proc Neural Information and Processing Systems

ranzato, 2007, Sparse Feature Learning for Deep Belief Networks, Proc Neural Information and Processing Systems

ranzato, 2010, Factored 3-Way Restricted Boltzmann Machines for Modeling Natural Images, Proc Conf Artificial Intelligence and Statistics, 621

ranzato, 2010, Generating More Realistic Images Using Gated MRF's, Proc Neural Information and Processing Systems

10.1109/CVPR.2011.5995710

riesenhuber, 1999, Hierarchical Models of Object Recognition in Cortex, Nature Neuroscience, 2, 1019, 10.1038/14819

courville, 2011, Unsupervised Models of Images by Spike-and-Slab RBMs, Proc Int'l Conf Machine Learning

dahl, 2010, Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine, Proc Neural Information and Processing Systems

10.1109/TASL.2011.2134090

deng, 2010, Binary Coding of Speech Spectrograms Using a Deep Auto-Encoder, Proc Ann Conf Int'l Speech Comm Assoc

desjardins, 2008, Empirical Evaluation of Convolutional RBMs for Vision

desjardins, 2010, Tempered Markov Chain Monte Carlo for Training of Restricted Boltzmann Machine, Proc Conf Artificial Intelligence and Statistics, 9, 145

desjardins, 2011, On Tracking the Partition Function, Proc Neural Information and Processing Systems

courville, 2011, A Spike and Slab Restricted Boltzmann Machine, Proc Int'l Conf Artificial Intelligence and Statistics

collobert, 2011, Natural Language Processing (almost) from Scratch, J Machine Learning Research, 12, 2493

10.1145/1390156.1390177

10.1145/1273496.1273592

raiko, 2012, Deep Learning Made Easier by Linear Transformations in Perceptrons, Proc Conf Artificial Intelligence and Statistics

pascanu, 2013, Natural Gradient Revisited

1998, Neural Networks Tricks of the Trade

murray, 2008, Evaluating Probabilities under High-Dimensional Latent Variable Models, Proc Neural Information and Processing Systems, 1137

nair, 2010, Rectified Linear Units Improve Restricted Boltzmann Machines, Proc Int'l Conf Machine Learning

ngiam, 2011, Learning Deep Energy Models, Proc Int'l Conf Machine Learning

10.1038/381607a0

10.1016/0004-3702(92)90065-6

neal, 1993, Probabilistic Inference Using Markov Chain Monte-Carlo Methods

10.1109/IJCNN.2010.5596837

cho, 2011, Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines, Proc Int'l Conf Machine Learning, 105

carreira-perpian, 2005, On Contrastive Divergence Learning, Proc Int'l Workshop Artificial Intelligence and Statistics, 33

chen, 2012, Marginalized Denoising Autoencoders for Domain Adaptation, Proc Int'l Conf Machine Learning

coates, 2011, The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization, Proc Int'l Conf Machine Learning

coates, 2011, Selecting Receptive Fields in Deep Networks, Proc Neural Information and Processing Systems

10.1109/CVPR.2012.6248110

10.1162/NECO_a_00052

10.1007/BF01272517

10.1145/12130.12132

10.1007/978-3-642-21735-7_6

henaff, 2011, Unsupervised Learning of Sparse Features for Scalable Audio Classification, Proc Int Conf Music Information Retrieval

hamel, 2011, Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio, Proc Int'l Conf Music Information Retrieval

gutmann, 2010, Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models, Proc Conf Artificial Intelligence and Statistics

montufar, 2012, When Does a Mixture of Products Contain a Product of Mixtures?

mikolov, 2011, Empirical Evaluation and Combination of Advanced Language Modeling Techniques, Proc Ann Conf Int'l Speech Comm Assoc

mesnil, 2011, Unsupervised and Transfer Learning Challenge: A Deep Learning Approach, Proceedings of the Unsupervised and Transfer Learning Challenge and Workshop, 7

10.1109/TASL.2011.2109382

10.1145/1553374.1553469

marlin, 2010, Inductive Principles for Restricted Boltzmann Machine Learning, Proc Conf Artificial Intelligence and Statistics, 509

martens, 2010, Deep Learning via Hessian-Free Optimization, Proc Int'l Conf Machine Learning, 735

martens, 2011, Learning Recurrent Neural Networks with Hessian-Free Optimization, Proc Int'l Conf Machine Learning

10.1162/neco.2010.01-09-953

marlin, 2011, Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood, Proc Conf Uncertainty in Artificial Intelligence

grubb, 2010, Boosted Backpropagation Learning for Training Deep Modular Networks, Proc Int'l Conf Machine Learning

gregor, 2010, Learning Fast Approximations of Sparse Coding, Proc Int'l Conf Machine Learning

gregor, 2011, Structured Sparse Coding via Lateral Inhibition, Proc Neural Information and Processing Systems

10.1109/TSP.2011.2107908

grosse, 2007, Shift-Invariant Sparse Coding for Audio Classification, Proc Conf Uncertainty in Artificial Intelligence

gregor, 2010, Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields

goodfellow, 2012, Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery

goodfellow, 2011, Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery, Proc NIPS Workshop Challenges in Learning Hierarchical Models

goodfellow, 2009, Measuring Invariances in Deep Networks, Proc Neural Info Process Syst, 646

glorot, 2011, Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, Proc Int'l Conf Machine Learning

10.1002/cpa.21413

10.1109/ICCV.1999.790410

lin, 2010, Deep Coding Network, Proc Neural Information and Processing Systems

lee, 2009, Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks, Proc Neural Info Process Syst

10.1145/1553374.1553453

lee, 2007, Sparse Deep Belief Net Model for Visual Area V2, Proc Neural Information and Processing Systems

lecun, 1998, Neural Networks Tricks of the Trade

10.1109/5.726791

lecun, 1989, Connectionism in Perspective

10.1162/neco.1989.1.4.541

glorot, 2010, Understanding the Difficulty of Training Deep Feedforward Neural Networks, Proc Conf Artificial Intelligence and Statistics

glorot, 2011, Deep Sparse Rectifier Neural Networks, Proc Conf Artificial Intelligence and Statistics

freund, 1994, Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks

10.1007/BF00344251

erhan, 2010, Understanding Representations Learned in Deep Architectures

erhan, 2010, Why Does Unsupervised Pre-Training Help Deep Learning?, J Machine Learning Research, 11, 625

roweis, 1997, EM Algorithms for PCA and Sensible PCA

rifai, 2012, A Generative Process for Sampling Contractive Auto-Encoders, Proc Int'l Conf Machine Learning

salakhutdinov, 2010, Learning Deep Boltzmann Machines Using Adaptive MCMC, Proc Int'l Conf Machine Learning

roweis, 2000, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, 290, 2323, 10.1126/science.290.5500.2323

hinton, 1986, Learning Distributed Representations of Concepts, Proc Eighth Ann Conf Cognitive Science Soc, 1

salakhutdinov, 2007, Semantic Hashing, Proc Int'l ACM SIGIR Conf Research and Development in Information Retrieval

10.1109/MSP.2012.2205597

salakhutdinov, 2010, Learning in Markov Random Fields Using Tempered Transitions, Proc Neural Information and Processing Systems

hinton, 2000, Training Products of Experts by Minimizing Contrastive Divergence

salakhutdinov, 2010, Efficient Learning of Deep Boltzmann Machines, Proc Conf Artificial Intelligence and Statistics

hinton, 1999, Products of Experts, Proc Int'l Conf Artificial Neural Networks

salakhutdinov, 2009, Deep Boltzmann Machines, Proc Conf Artificial Intelligence and Statistics, 448

hinton, 1993, Autoencoders, Minimum Description Length, and Helmholtz Free Energy, Proc Neural Information and Processing Systems

10.1145/1273496.1273596

10.1162/neco.2006.18.7.1527

savard, 2011, Rseaux de Neurones Relaxation Entrans par Critre d'Autoencodeur Dbruitant

10.1113/jphysiol.1959.sp006308

hurri, 2002, Temporal Coherence, Natural Image Sequences, and the Visual Cortex, Proc Neural Information and Processing Systems

hinton, 2010, A Practical Guide to Training Restricted Boltzmann Machines

hinton, 2002, Stochastic Neighbor Embedding, Proc Neural Info Process Syst

10.1126/science.1127647

10.1109/ASRU.2011.6163899

seide, 2011, Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, Proc Conf Int'l Speech Comm Assoc, 437

schwenk, 2012, Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation, Proc Workshop the Future of Language Modeling for HLT

10.1162/089976698300017467

simard, 1991, Tangent PropA Formalism for Specifying Selected Invariances in an Adaptive Network, Proc Neural Information and Processing Systems

10.1162/neco.2008.10-06-384

simard, 2003, Best Practices for Convolutional Neural Networks, Proc Seventh Int'l Conf Document Analysis and Recognition, 10.1109/ICDAR.2003.1227801

10.1016/j.csda.2006.09.003

seung, 1997, Learning Continuous Attractors in Recurrent Networks, Proc Neural Information and Processing Systems

10.1109/TPAMI.2007.56

hyvrinen, 2005, Estimation of Non-Normalized Statistical Models Using Score Matching, J Machine Learning Research, 6, 695

schmah, 2008, Generative versus Discriminative Training of RBMs for Classification of fMRI Images, Proc Neural Information and Processing Systems, 1409

simard, 1992, Efficient Pattern Recognition Using a New Transformation Distance, Proc Neural Information and Processing Systems

sutskever, 2010, On the Convergence Properties of Contrastive Divergence, Proc Conf Artificial Intelligence and Statistics

sutskever, 2012, Training Recurrent Neural Networks

swersky, 2010, Inductive Principles for Learning Restricted Boltzmann Machines

sutskever, 2008, The Recurrent Temporal Restricted Boltzmann Machine, Proc Neural Information and Processing Systems

socher, 2011, Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions, Proc Conf Empirical Methods in Natural Language Processing

socher, 2011, Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection, Proc Neural Information and Processing Systems

stoyanov, 2011, Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure, Proc Conf Artificial Intelligence and Statistics

srivastava, 2012, Multimodal Learning with Deep Boltzmann Machines, Proc Neural Info Process Syst

snoek, 2012, Practical Bayesian Optimization of Machine Learning Algorithms, Proc Neural Information and Processing Systems

smolensky, 1986, Parallel Distributed Processing, 194

tenenbaum, 2000, A Global Geometric Framework for Nonlinear Dimensionality Reduction, Science, 290, 2319, 10.1126/science.290.5500.2319

taylor, 2010, Convolutional Learning of Spatio-Temporal Features, Proc European Conf Computer Vision

10.1145/1553374.1553505

swersky, 2011, On Score Matching for Energy Based Models: Generalizing Autoencoders and Simplifying Deep Learning, Proc Int'l Conf Machine Learning

10.1162/neco.2009.10-08-881

10.1111/1467-9868.00196

10.1145/1553374.1553506

10.1145/1390156.1390290

van der maaten, 2008, Visualizing High-Dimensional Data Using t-SNE, J Machine Learning Research, 9, 2579

van der maaten, 2009, Learning a Parametric Embedding by Preserving Local Structure, Proc Conf Artificial Intelligence and Statistics