Long Short-Term Memory Tập 9 Số 8 - Trang 1735-1780 - 1997
Sepp Hochreiter, Jürgen Schmidhuber
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do ...... hiện toàn bộ
A Fast Learning Algorithm for Deep Belief Nets Tập 18 Số 7 - Trang 1527-1554 - 2006
Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh
We show how to use “complementary priors” to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy a...... hiện toàn bộ
Backpropagation Applied to Handwritten Zip Code Recognition Tập 1 Số 4 - Trang 541-551 - 1989
Yann LeCun, Bernhard E. Boser, John S. Denker, D. Henderson, Richard Howard, W. Hubbard, L. D. Jackel
The ability of learning networks to generalize can be greatly enhanced by providing constraints from the task domain. This paper demonstrates how such constraints can be integrated into a backpropagation network through the architecture of the network. This approach has been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service. A singl...... hiện toàn bộ
An Information-Maximization Approach to Blind Separation and Blind Deconvolution Tập 7 Số 6 - Trang 1129-1159 - 1995
Anthony J. Bell, Terrence J. Sejnowski
We derive a new self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The algorithm does not assume any knowledge of the input distributions, and is defined here for the zero-noise limit. Under these conditions, information maximization has extra properties not found in the linear case (Linsker 1989). The nonlinearities in the trans...... hiện toàn bộ
Nonlinear Component Analysis as a Kernel Eigenvalue Problem Tập 10 Số 5 - Trang 1299-1319 - 1998
Bernhard Schölkopf, Alexander J. Smola, Klaus‐Robert Müller
A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and p...... hiện toàn bộ
Fast Learning in Networks of Locally-Tuned Processing Units Tập 1 Số 2 - Trang 281-294 - 1989
John Moody, Christian J. Darken
We propose a network architecture which uses a single internal layer of locally-tuned processing units to learn both classification tasks and real-valued function approximations (Moody and Darken 1988). We consider training such networks in a completely supervised manner, but abandon this approach in favor of a more computationally efficient hybrid learning method which combines self-orga...... hiện toàn bộ
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks Tập 1 Số 2 - Trang 270-280 - 1989
Ronald J. Williams, David Zipser
The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have (1) the advantage that they do not require a precisely defined training interval, operating while the network runs; and (2) the disadvantage th...... hiện toàn bộ
Bayesian Interpolation Tập 4 Số 3 - Trang 415-447 - 1992
David Mackay
Although Bayesian analysis has been in use since Laplace, the Bayesian method of model-comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other data mode...... hiện toàn bộ
Adaptive Mixtures of Local Experts Tập 3 Số 1 - Trang 79-87 - 1991
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, Geoffrey E. Hinton
We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently diffe...... hiện toàn bộ
Universal Approximation Using Radial-Basis-Function Networks Tập 3 Số 2 - Trang 246-257 - 1991
Jihun Park, Irwin W. Sandberg
There have been several recent studies concerning feedforward networks and the problem of approximating arbitrary functionals of a finite number of real variables. Some of these studies deal with cases in which the hidden-layer nonlinearity is not a sigmoid. This was motivated by successful applications of feedforward networks with nonsigmoidal hidden-layer units. This p...... hiện toàn bộ