Learning long-term dependencies with gradient descent is difficult

IEEE Transactions on Neural Networks - Tập 5 Số 2 - Trang 157-166 - 1994
Yoshua Bengio1, P. Simard2, Paolo Frasconi3
1Dept. d'inf. et de Recherche Oper., Montreal Univ., Que., Canada
2AT and T Bell Laboratories, Inc., Holmdel, NJ, USA
3Dipartimento di Sistemi e Informatica -- Universita' di Firenze -- Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

10.1109/IJCNN.1989.118276

10.1109/NNSP.1992.253712

grossman, 0, Learning by choice of internal representation, Neural Information Processing Systems, 1, 73

kirkpatrick, 1983, Optimization by simulated annealing, Science, 220, 671, 10.1126/science.220.4598.671

kuhn, 1987, A first look at phonetic discrimination using connectionist models with recurrent links

lang, 1988, The development of the time-delay neural network architecture for speech recognition

le cun, 1986, Disordered Systems and Biological Organization, 233, 10.1007/978-3-642-82657-3_24

10.1016/0167-2789(91)90236-3

mozer, 1989, A focused back-propagation algorithm for temporal pattern recognition, Complex Systems, 3, 349

mozer, 1992, Advances in neural information processing systems, 4, 275

bengio, 1991, Artificial neural networks and their application to sequence recognition

10.1109/ICNN.1993.298725

10.1145/29380.29864

10.1109/72.125866

frasconi, 0, Unified integration of explicit rules and learning by example in recurrent networks, IEEE Trans on Knowledge and Data Engineering

10.1162/neco.1992.4.1.120

becker, 0, Improving the convergence of backpropagation learning with second order methods, Proceedings of the 1988 Connectionist Models Summer School, 29

10.1109/72.125861

10.1109/ICNN.1993.298832

ortega, 1960, Iterative Solution of Non-linear Equations in Several Variables and Systems of Equations

rumelhart, 1986, Parallel Distributed Processing, 1, 318

rohwer, 1990, Advances in neural information processing systems, 2, 558

10.1162/neco.1989.1.2.270