Long Short-Term Memory

Neural Computation - Tập 9 Số 8 - Trang 1735-1780 - 1997
Sepp Hochreiter1, Jürgen Schmidhuber2
1Fakultät für Informatik, Technische Universität München, 80290 München, Germany
2IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland

Tóm tắt

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

Từ khóa


Tài liệu tham khảo

Baldi P., 1991, Neural Computation, 3, 526, 10.1162/neco.1991.3.4.526

Bengio Y., 1994, IEEE Transactions on Neural Networks, 5, 157, 10.1109/72.279181

Cleeremans A., 1989, Neural Computation, 1, 372, 10.1162/neco.1989.1.3.372

Doya K., 1989, Neural Networks, 2, 375, 10.1016/0893-6080(89)90022-1

Lang K., 1990, Neural Networks, 3, 23, 10.1016/0893-6080(90)90044-L

Lin T., 1996, IEEE Transactions on Neural Networks, 7, 1329, 10.1109/72.548162

Miller C. B., 1993, International Journal of Pattern Recognition and Artificial Intelligence, 7, 849, 10.1142/S0218001493000431

Mozer M. C., 1989, Complex Systems, 3, 349

Pearlmutter B. A., 1989, Neural Computation, 1, 263, 10.1162/neco.1989.1.2.263

Pearlmutter B. A., 1995, IEEE Transactions on Neural Networks, 6, 1212, 10.1109/72.410363

Pineda F. J., 1987, Physical Review Letters, 19, 2229, 10.1103/PhysRevLett.59.2229

Pineda F. J., 1988, Journal of Complexity, 4, 216, 10.1016/0885-064X(88)90021-0

Puskorius G. V., 1994, IEEE Transactions on Neural Networks, 5, 279, 10.1109/72.279191

Schmidhuber J., 1989, Connection Science, 1, 403, 10.1080/09540098908915650

Schmidhuber J., 1992, Neural Computation, 4, 243, 10.1162/neco.1992.4.2.243

Schmidhuber J., 1992, Neural Computation, 4, 234, 10.1162/neco.1992.4.2.234

Smith A. W., 1989, International Journal of Neural Systems, 1, 125, 10.1142/S0129065789000037

Watrous R. L., 1992, Neural Computation, 4, 406, 10.1162/neco.1992.4.3.406

Werbos P. J., 1988, Neural Networks, 1, 339, 10.1016/0893-6080(88)90007-X

Williams R. J., 1990, Neural Computation, 4, 491