Langevin Algorithms for Very Deep Neural Networks with Application to Image Classification

Elsevier BV - Tập 222 - Trang 303-310 - 2023
Pierre Pierre

Tài liệu tham khảo

Anirudh Bhardwaj, 2019, Adaptively Preconditioned Stochastic Gradient Langevin Dynamics, arXiv e-prints Bras, 2021, Convergence of Langevin-Simulated Annealing algorithms with multiplicative noise, arXiv e-prints Dauphin, 2014, Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization, 2, 2933 Duchi, 2011, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Journal of Machine Learning Research, 12, 2121 Glorot, 2010, Understanding the difficulty of training deep feedforward neural networks, 249 Gulcehre, 2016, Noisy activation functions, 48, 3059 Hanin, 2018, Which neural net architectures give rise to exploding and vanishing gradients?, NeurIPS, 580 He, 2016, Deep residual learning for image recognition, 770 Hochreiter, 1991 Huang, 2017, Densely connected convolutional networks, 2261 Jarrett, 2009, What is the best multi-stage architecture for object recognition?, 2146 Kingma, 2015, Adam: A method for stochastic optimization Krizhevsky, 2009 Krizhevsky, 2012, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems van Laarhoven, 1987, of Mathematics and its Applications. D. Reidel Publishing Co., 37 LeCun, 2015, Deep learning, Nature, 10.1038/nature14539 Lecun, 1998, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 2278, 10.1109/5.726791 Lee, 2015, Deeply-Supervised Nets, 562 Li, 2016, Preconditioned stochastic gradient langevin dynamics for deep neural networks, 1788 Ma, 2015, A Complete Recipe for Stochastic Gradient MCMC, Neural Information Processing Systems Marceau-Caron, 2017, Natural Langevin Dynamics for Neural Networks, arXiv e-prints Montúfar, 2014, On the number of linear regions of deep neural networks, 2, 2924 Neelakantan, 2015, Adding Gradient Noise Improves Learning for Very Deep Networks, arXiv e-prints Patterson, 2013, Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex, Advances in Neural Information Processing Systems Shridhar, 2019, ProbAct: A Probabilistic Activation Function for Deep Neural Networks, arXiv e-prints Simonyan, 2015, Very deep convolutional networks for large-scale image recognition Simsekli, 2016, Stochastic Quasi-Newton Langevin Monte Carlo, 48, 642 Srivastava, 2015, Training very deep networks, Advances in Neural Information Processing Systems Szegedy, 2015, Going deeper with convolutions, 1 Tieleman, 2012, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, Coursera: Neural Networks for Machine Learning Valentin Jospin, 2020, Hands-on Bayesian Neural Networks – a Tutorial for Deep Learning Users, arXiv e-prints Welling, 2011, Bayesian Learning via Stochastic Gradient Langevin Dynamics, 681 Yu, 2021, Simple and effective stochastic neural networks, 35, 3252 Zeiler, 2012, ADADELTA: An Adaptive Learning Rate Method, arXiv e-prints