Long Short-Term Memory

Neural Computation - Tập 9 Số 8 - Trang 1735-1780 - 1997

Sepp Hochreiter¹, Jürgen Schmidhuber²

¹Fakultät für Informatik, Technische Universität München, 80290 München, Germany

²IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland

Tóm tắt

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

Từ khóa

Tài liệu tham khảo

Baldi P., 1991, Neural Computation, 3, 526, 10.1162/neco.1991.3.4.526

Bengio Y., 1994, IEEE Transactions on Neural Networks, 5, 157, 10.1109/72.279181

Cleeremans A., 1989, Neural Computation, 1, 372, 10.1162/neco.1989.1.3.372

Doya K., 1989, Neural Networks, 2, 375, 10.1016/0893-6080(89)90022-1

Lang K., 1990, Neural Networks, 3, 23, 10.1016/0893-6080(90)90044-L

Lin T., 1996, IEEE Transactions on Neural Networks, 7, 1329, 10.1109/72.548162

Miller C. B., 1993, International Journal of Pattern Recognition and Artificial Intelligence, 7, 849, 10.1142/S0218001493000431

Mozer M. C., 1989, Complex Systems, 3, 349

Pearlmutter B. A., 1989, Neural Computation, 1, 263, 10.1162/neco.1989.1.2.263

Pearlmutter B. A., 1995, IEEE Transactions on Neural Networks, 6, 1212, 10.1109/72.410363

Pineda F. J., 1987, Physical Review Letters, 19, 2229, 10.1103/PhysRevLett.59.2229

Pineda F. J., 1988, Journal of Complexity, 4, 216, 10.1016/0885-064X(88)90021-0

Puskorius G. V., 1994, IEEE Transactions on Neural Networks, 5, 279, 10.1109/72.279191

Schmidhuber J., 1989, Connection Science, 1, 403, 10.1080/09540098908915650

Schmidhuber J., 1992, Neural Computation, 4, 243, 10.1162/neco.1992.4.2.243

Schmidhuber J., 1992, Neural Computation, 4, 234, 10.1162/neco.1992.4.2.234

Smith A. W., 1989, International Journal of Neural Systems, 1, 125, 10.1142/S0129065789000037

Watrous R. L., 1992, Neural Computation, 4, 406, 10.1162/neco.1992.4.3.406

Werbos P. J., 1988, Neural Networks, 1, 339, 10.1016/0893-6080(88)90007-X

Williams R. J., 1990, Neural Computation, 4, 491

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA