Langevin Algorithms for Very Deep Neural Networks with Application to Image Classification
Tài liệu tham khảo
Anirudh Bhardwaj, 2019, Adaptively Preconditioned Stochastic Gradient Langevin Dynamics, arXiv e-prints
Bras, 2021, Convergence of Langevin-Simulated Annealing algorithms with multiplicative noise, arXiv e-prints
Dauphin, 2014, Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization, 2, 2933
Duchi, 2011, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Journal of Machine Learning Research, 12, 2121
Glorot, 2010, Understanding the difficulty of training deep feedforward neural networks, 249
Gulcehre, 2016, Noisy activation functions, 48, 3059
Hanin, 2018, Which neural net architectures give rise to exploding and vanishing gradients?, NeurIPS, 580
He, 2016, Deep residual learning for image recognition, 770
Hochreiter, 1991
Huang, 2017, Densely connected convolutional networks, 2261
Jarrett, 2009, What is the best multi-stage architecture for object recognition?, 2146
Kingma, 2015, Adam: A method for stochastic optimization
Krizhevsky, 2009
Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems
van Laarhoven, 1987, of Mathematics and its Applications. D. Reidel Publishing Co., 37
LeCun, 2015, Deep learning, Nature, 10.1038/nature14539
Lecun, 1998, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 2278, 10.1109/5.726791
Lee, 2015, Deeply-Supervised Nets, 562
Li, 2016, Preconditioned stochastic gradient langevin dynamics for deep neural networks, 1788
Ma, 2015, A Complete Recipe for Stochastic Gradient MCMC, Neural Information Processing Systems
Marceau-Caron, 2017, Natural Langevin Dynamics for Neural Networks, arXiv e-prints
Montúfar, 2014, On the number of linear regions of deep neural networks, 2, 2924
Neelakantan, 2015, Adding Gradient Noise Improves Learning for Very Deep Networks, arXiv e-prints
Patterson, 2013, Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex, Advances in Neural Information Processing Systems
Shridhar, 2019, ProbAct: A Probabilistic Activation Function for Deep Neural Networks, arXiv e-prints
Simonyan, 2015, Very deep convolutional networks for large-scale image recognition
Simsekli, 2016, Stochastic Quasi-Newton Langevin Monte Carlo, 48, 642
Srivastava, 2015, Training very deep networks, Advances in Neural Information Processing Systems
Szegedy, 2015, Going deeper with convolutions, 1
Tieleman, 2012, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, Coursera: Neural Networks for Machine Learning
Valentin Jospin, 2020, Hands-on Bayesian Neural Networks – a Tutorial for Deep Learning Users, arXiv e-prints
Welling, 2011, Bayesian Learning via Stochastic Gradient Langevin Dynamics, 681
Yu, 2021, Simple and effective stochastic neural networks, 35, 3252
Zeiler, 2012, ADADELTA: An Adaptive Learning Rate Method, arXiv e-prints