Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network

Physica D: Nonlinear Phenomena - Tập 404 - Trang 132306 - 2020
A. Sherstinsky

Tóm tắt

Từ khóa


Tài liệu tham khảo

Hochreiter, 1997, Long short-term memory, Neural Comput., 9, 1735, 10.1162/neco.1997.9.8.1735

Lin, 2017, Criticality in formal languages and statistical physics, Entropy, 19, 299, 10.3390/e19070299

Sanger, 1989, Optimal unsupervised learning in a single-layer linear feedforward neural network, Neural Netw., 2, 459, 10.1016/0893-6080(89)90044-0

Sherstinsky, 1994

Liao, 2016, Bridging the gaps between residual learning, recurrent neural networks and visual cortex, CoRR, abs/1604.03640

Haber, 2017, Stable architectures for deep neural networks, Inverse Problems, 34, 10.1088/1361-6420/aa9a90

Lu, 2018, Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations, vol. 80, 3282

Ruthotto, 2018, Deep neural networks motivated by partial differential equations, CoRR, abs/1804.04272,

Sherstinsky, 2018

Sherstinsky, 2018, Deriving the recurrent neural network definition and rnn unrolling using signal processing, vol. 31

Ciccone, 2018, Nais-net: Stable deep networks from non-autonomous differential equations, 3029

Chang, 2018, Reversible architectures for arbitrarily deep residual neural networks, 2811

Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi, AntisymmetricRNN: A dynamical system view on recurrent neural networks. in: International Conference on Learning Representations, 2019.

Chen, 2018, Neural ordinary differential equations, 6572

Rubanova, 2019

Greff, 2015, LSTM: A search space Odyssey, CoRR, abs/1503.04069

Graves, 2005, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw., 18, 602, 10.1016/j.neunet.2005.06.042

A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM networks. in: Proc. Int. Joint Conf. on Neural Networks IJCNN 2005, 2005.

Graves, 2008

Sutskever, 2011, Generating text with recurrent neural networks, 1017

Martin Sundermeyer, Ralf Schlüter, Hermann Ney, LSTM neural networks for language modeling. in: Interspeech, 2012, 194–197.

Graves, 2013, Generating sequences with recurrent neural networks, CoRR, abs/1308.0850

Sutskever, 2014, Sequence to sequence learning with neural networks, 3104

Sak, 2014, Long short-term memory recurrent neural network architectures for large scale acoustic modeling, 338

Lipton, 2015, A critical review of recurrent neural networks for sequence learning, CoRR, abs/1506.00019

Karpathy, 2015

Olah, 2015

Palangi, 2015, Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval, CoRR, abs/1502.06922

Kannan, 2016, Smart reply: Automated response suggestion for email, CoRR, abs/1606.04870

Zhou, 2016, Deep recurrent models with fast-forward connections for neural machine translation, CoRR, abs/1606.04199

Renvoisé, 2017

Chen, 2017

Mallya, 2017

Mallya, 2017

Mallya, 2017

Jayasiri, 2017

Salehinejad, 2018, Recent advances in recurrent neural networks, CoRR, abs/1801.01078

Strogatz, 1994

Wang, 2017, A new concept using LSTM neural networks for dynamic system identification, 5324

Grossberg, 2013, Recurrent neural networks, Scholarpedia, 8, 1888, 10.4249/scholarpedia.1888

Hopfield, 1984, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Natl. Acad. Sci., 81, 3088, 10.1073/pnas.81.10.3088

Grossberg, 1988, Nonlinear neural networks: Principles, mechanisms, and architectures, Neural Netw., 1, 17, 10.1016/0893-6080(88)90021-4

Sherstinsky, 1996, M-lattice: from morphogenesis to image processing, IEEE Trans. Image Process., 5, 1137, 10.1109/83.502393

Yuliya Kyrychko, Stephen Hogan, On the use of delay equations in engineering applications, 16 (2010) 943–960.

Metropolis, 1953, Equation of state calculations by fast computing machines, J. Chem. Phys., 21, 1087, 10.1063/1.1699114

Sherstinsky, 1998, On stability and equilibria of the M-Lattice, IEEE Trans. Circuit Syst. I, 45, 408, 10.1109/81.669063

Ostroverkhyi, 2010

Jordan, 1986

Pineda, 1987, Generalization of backpropagation to recurrent neural networks, Phys. Rev. Lett., 59, 2229, 10.1103/PhysRevLett.59.2229

Fernando L. Pineda, 1987, Generalization of backpropagation to recurrent and higher order neural networks, 602

Pearlmutter, 1989, Learning state space trajectories in recurrent neural networks, Neural Comput., 1, 263, 10.1162/neco.1989.1.2.263

Elman, 1990, Finding structure in time, Cogn. Sci., 14, 179, 10.1207/s15516709cog1402_1

Barak A. Pearlmutter, 1990

de Vries, 1991, A theory for neural networks with time delays, 162

Chua, 1988, Cellular neural networks: Theory, IEEE Trans. Circuits Syst., 35, 1257, 10.1109/31.7600

Chua, 1988, Cellular neural networks: Applications, IEEE Trans. Circuits Syst., 35, 1273, 10.1109/31.7601

Lloyd N. Trefethen, Finite Difference and Spectral Methods for Ordinary and Partial Differential Equations. unpublished text, Cambridge, MA, 1996.

Oppenheim, 1989

Vinyals, 2013

Sherstinsky, 1996, On the efficiency of the orthogonal least squares training method for radial basis function networks, IEEE Trans. Neural Netw., 7, 195, 10.1109/72.478404

Bose, 1956

Hochreiter, 2001, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, On the difficulty of training recurrent neural networks. in: International Conference on Machine Learning, 2013, pp. 1310–1318.

Werbos, 1988, Generalization of backpropagation with application to a recurrent gas market model, Neural Netw., 1, 10.1016/0893-6080(88)90007-X

Werbos, 1990, Backpropagation through time: what does it do and how to do it, vol. 78, 1550

Sutskever, 2012

Pascanu, 2014

Rumelhart, 1985

1986

Minsky, 1990

Williams, 1989, A learning algorithm for continually running fully recurrent neural networks, Neural Comput., 1, 270, 10.1162/neco.1989.1.2.270

Schuster, 1997, Bidirectional recurrent neural networks, IEEE Trans. Signal Process., 45, 2673, 10.1109/78.650093

Levy, 2018, Long short-term memory as a dynamically computed element-wise weighted sum, CoRR, abs/1805.03716

Gers, 2001

Rabiner, 1971, Techniques for designing finite-duration impulse-response digital filters, IEEE Trans. Commun. Technol., 19, 188, 10.1109/TCOM.1971.1090625

McClellan, 1973, A computer program for designing optimum FIR linear phase digital filters, IEEE Trans. Audio Electroacoust., 21, 506, 10.1109/TAU.1973.1162525

Yamamoto, 2003, Optimizing FIR approximation for discrete-time IIR filters, IEEE Signal Process. Lett., 10, 273, 10.1109/LSP.2003.815615

Zaremba, 2014