Long Short-Term Memory Tập 9 Số 8 - Trang 1735-1780 - 1997
Sepp Hochreiter, Jürgen Schmidhuber
Learning to store information over extended time intervals by recurrent
backpropagation takes a very long time, mostly because of insufficient, decaying
error backflow. We briefly review Hochreiter's (1991) analysis of this problem,
then address it by introducing a novel, efficient, gradient based method called
long short-term memory (LSTM). Truncating the gradient where this does not do
harm, LST... hiện toàn bộ
A Fast Learning Algorithm for Deep Belief Nets Tập 18 Số 7 - Trang 1527-1554 - 2006
Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh
We show how to use “complementary priors” to eliminate the explaining-away
effects that make inference difficult in densely connected belief nets that have
many hidden layers. Using complementary priors, we derive a fast, greedy
algorithm that can learn deep, directed belief networks one layer at a time,
provided the top two layers form an undirected associative memory. The fast,
greedy algorithm ... hiện toàn bộ
Backpropagation Applied to Handwritten Zip Code Recognition Tập 1 Số 4 - Trang 541-551 - 1989
Yann LeCun, Bernhard E. Boser, John S. Denker, D. Henderson, Richard Howard, W. Hubbard, L. D. Jackel
The ability of learning networks to generalize can be greatly enhanced by
providing constraints from the task domain. This paper demonstrates how such
constraints can be integrated into a backpropagation network through the
architecture of the network. This approach has been successfully applied to the
recognition of handwritten zip code digits provided by the U.S. Postal Service.
A single network... hiện toàn bộ
An Information-Maximization Approach to Blind Separation and Blind Deconvolution Tập 7 Số 6 - Trang 1129-1159 - 1995
Anthony J. Bell, Terrence J. Sejnowski
We derive a new self-organizing learning algorithm that maximizes the
information transferred in a network of nonlinear units. The algorithm does not
assume any knowledge of the input distributions, and is defined here for the
zero-noise limit. Under these conditions, information maximization has extra
properties not found in the linear case (Linsker 1989). The nonlinearities in
the transfer funct... hiện toàn bộ
Nonlinear Component Analysis as a Kernel Eigenvalue Problem Tập 10 Số 5 - Trang 1299-1319 - 1998
Bernhard Schölkopf, Alexander J. Smola, Klaus‐Robert Müller
A new method for performing a nonlinear form of principal component analysis is
proposed. By the use of integral operator kernel functions, one can efficiently
compute principal components in high-dimensional feature spaces, related to
input space by some nonlinear map—for instance, the space of all possible
five-pixel products in 16 × 16 images. We give the derivation of the method and
present ex... hiện toàn bộ
Universal Approximation Using Radial-Basis-Function Networks Tập 3 Số 2 - Trang 246-257 - 1991
Jihun Park, Irwin W. Sandberg
There have been several recent studies concerning feedforward networks and the
problem of approximating arbitrary functionals of a finite number of real
variables. Some of these studies deal with cases in which the hidden-layer
nonlinearity is not a sigmoid. This was motivated by successful applications of
feedforward networks with nonsigmoidal hidden-layer units. This paper reports on
a related s... hiện toàn bộ
Fast Learning in Networks of Locally-Tuned Processing Units Tập 1 Số 2 - Trang 281-294 - 1989
John Moody, Christian J. Darken
We propose a network architecture which uses a single internal layer of
locally-tuned processing units to learn both classification tasks and
real-valued function approximations (Moody and Darken 1988). We consider
training such networks in a completely supervised manner, but abandon this
approach in favor of a more computationally efficient hybrid learning method
which combines self-organized and... hiện toàn bộ
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks Tập 1 Số 2 - Trang 270-280 - 1989
Ronald J. Williams, David Zipser
The exact form of a gradient-following learning algorithm for completely
recurrent networks running in continually sampled time is derived and used as
the basis for practical algorithms for temporal supervised learning tasks. These
algorithms have (1) the advantage that they do not require a precisely defined
training interval, operating while the network runs; and (2) the disadvantage
that they r... hiện toàn bộ
Bayesian Interpolation Tập 4 Số 3 - Trang 415-447 - 1992
David Mackay
Although Bayesian analysis has been in use since Laplace, the Bayesian method of
model-comparison has only recently been developed in depth. In this paper, the
Bayesian approach to regularization and model-comparison is demonstrated by
studying the inference problem of interpolating noisy data. The concepts and
methods described are quite general and can be applied to many other data
modeling prob... hiện toàn bộ
Adaptive Mixtures of Local Experts Tập 3 Số 1 - Trang 79-87 - 1991
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, Geoffrey E. Hinton
We present a new supervised learning procedure for systems composed of many
separate networks, each of which learns to handle a subset of the complete set
of training cases. The new procedure can be viewed either as a modular version
of a multilayer supervised network, or as an associative version of competitive
learning. It therefore provides a new link between these two apparently
different appr... hiện toàn bộ