On the momentum term in gradient descent learning algorithms
Tóm tắt
Từ khóa
Tài liệu tham khảo
Anderson, J.A., Pellionisz, A., & Rosenfield, E. (Eds.), (1990). Neurocomputing 2: directions for research. Cambridge, MA: MIT Press.
Jacobs, 1988, Increased rates of convergence through learning rate adaptation, Neural Networks, 1, 295, 10.1016/0893-6080(88)90003-2
Kleppner, D., & Kolenkow, R.J. (1973). An introduction to mechanics. New York: McGraw-Hill.
LeCun, Y., Denker, J.S., & Solla, S.A. (1990). Optimal brain damage. In D.S. Touretzky (Ed.), Advances in neural information processing systems 2 (NIPS*89) (pp. 598–605). Denver, CO: Morgan Kaufman.
McClelland, J.L., & Rumelhart, D.E. (1986). Parallel distributed processing (Vol. 2). Cambridge, MA: MIT Press.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (1992). Numerical recipes in C. Cambridge, UK: Cambridge University Press.
Qian, 1988, Predicting the secondary structure of globular proteins using neural network models, J. Mol. Biol., 202, 865, 10.1016/0022-2836(88)90564-5
Qian, N., & Sejnowski, T.J. (1989). Learning to solve random-dot stereograms of dense and transparent surfaces with recurrent backpropagation. In D.S. Touretzky, G.E. Hinton, & T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School (pp. 435–443). San Mateo, CA: Morgan Kaufmann.
Rumelhart, D.E., & McClelland, J.L. (1986). Parallel distributed processing (Vol. 1). Cambridge, MA: MIT Press.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, & J.L. McClelland (Eds.), Parallel distributed processing (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press.