Neural Computation
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
We propose that many human behaviors can be accurately described as a set of dynamic models (e.g., Kalman filters) sequenced together by a Markov chain. We then use these dynamic Markov models to recognize human behaviors from sensory data and to predict human behaviors over a few seconds time. To test the power of this modeling approach, we report an experiment in which we were able to achieve 95% accuracy at predicting automobile drivers' subsequent actions from their initial preparatory movements.
In this letter, we investigate the sampled-data state feedback control (SDSFC) problem of Boolean control networks (BCNs). Some necessary and sufficient conditions are obtained for the global stabilization of BCNs by SDSFC. Different from conventional state feedback controls, new phenomena observed the study of SDSFC. Based on the controllability matrix, we derive some necessary and sufficient conditions under which the trajectories of BCNs can be stabilized to a fixed point by piecewise constant control (PCC). It is proved that the global stabilization of BCNs under SDSFC is equivalent to that by PCC. Moreover, algorithms are given to construct the sampled-data state feedback controllers. Numerical examples are given to illustrate the efficiency of the obtained results.
Recurrent neural architectures having oscillatory dynamics use rhythmic network activity to represent patterns stored in short-term memory. Multiple stored patterns can be retained in memory over the same neural substrate because the network's state persistently switches between them. Here we present a simple oscillatory memory that extends the dynamic threshold approach of Horn and Usher ( 1991 ) by including weight decay. The modified model is able to match behavioral data from human subjects performing a running memory span task simply by assuming appropriate weight decay rates. The results suggest that simple oscillatory memories incorporating weight decay capture at least some key properties of human short-term memory. We examine the implications of the results for theories about the relative role of interference and decay in forgetting, and hypothesize that adjustments of activity decay rate may be an important aspect of human attentional mechanisms.
Neural associative memories are perceptron-like single-layer networks with fast synaptic learning typically storing discrete associations between pairs of neural activity patterns. Previous work optimized the memory capacity for various models of synaptic learning: linear Hopfield-type rules, the Willshaw model employing binary synapses, or the BCPNN rule of Lansner and Ekeberg, for example. Here I show that all of these previous models are limit cases of a general optimal model where synaptic learning is determined by probabilistic Bayesian considerations. Asymptotically, for large networks and very sparse neuron activity, the Bayesian model becomes identical to an inhibitory implementation of the Willshaw and BCPNN-type models. For less sparse patterns, the Bayesian model becomes identical to Hopfield-type networks employing the covariance rule. For intermediate sparseness or finite networks, the optimal Bayesian learning rule differs from the previous models and can significantly improve memory performance. I also provide a unified analytical framework to determine memory capacity at a given output noise level that links approaches based on mutual information, Hamming distance, and signal-to-noise ratio.
Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues. Furthermore, we suggest that the fundamental challenges in neural modeling are about representation rather than learning per se. This last point is supported by additional experiments with handwritten numerals.
We had previously shown that regularization principles lead to approximation schemes that are equivalent to networks with one layer of hidden units, called regularization networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known radial basis functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends radial basis functions (RBF) to hyper basis functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, some forms of projection pursuit regression, and several types of neural networks. We propose to use the term generalized regularization networks for this broad class of approximation schemes that follow from an extension of regularization. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In summary, different multilayer networks with one hidden layer, which we collectively call generalized regularization networks, correspond to different classes of priors and associated smoothness functionals in a classical regularization principle. Three broad classes are (1) radial basis functions that can be generalized to hyper basis functions, (2) some tensor product splines, and (3) additive splines that can be generalized to schemes of the type of ridge approximation, hinge functions, and several perceptron-like neural networks with one hidden layer.
This letter describes algorithms for nonnegative matrix factorization (NMF) with the β-divergence (β-NMF). The β-divergence is a family of cost functions parameterized by a single shape parameter β that takes the Euclidean distance, the Kullback-Leibler divergence, and the Itakura-Saito divergence as special cases (β = 2, 1, 0 respectively). The proposed algorithms are based on a surrogate auxiliary function (a local majorization of the criterion function). We first describe a majorization-minimization algorithm that leads to multiplicative updates, which differ from standard heuristic multiplicative updates by a β-dependent power exponent. The monotonicity of the heuristic algorithm can, however, be proven for β ∈ (0, 1) using the proposed auxiliary function. Then we introduce the concept of the majorization-equalization (ME) algorithm, which produces updates that move along constant level sets of the auxiliary function and lead to larger steps than MM. Simulations on synthetic and real data illustrate the faster convergence of the ME approach. The letter also describes how the proposed algorithms can be adapted to two common variants of NMF: penalized NMF (when a penalty function of the factors is added to the criterion function) and convex NMF (when the dictionary is assumed to belong to a known subspace).
This article considers high-order measures of independence for the independent component analysis problem and discusses the class of Jacobi algorithms for their optimization. Several implementations are discussed. We compare the proposed approaches with gradient-based techniques from the algorithmic point of view and also on a set of biomedical data.
We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square noiseless mixing but also the general case where the number of mixtures differs from the number of sources and the data are noisy. IFA is a two-step procedure. In the first step, the source densities, mixing matrix, and noise covariance are estimated from the observed data by maximum likelihood. For this purpose we present an expectation-maximization (EM) algorithm, which performs unsupervised learning of an associated probabilistic model of the mixing situation. Each source in our model is described by a mixture of gaussians; thus, all the probabilistic calculations can be performed analytically. In the second step, the sources are reconstructed from the observed data by an optimal nonlinear estimator. A variational approximation of this algorithm is derived for cases with a large number of sources, where the exact algorithm becomes intractable. Our IFA algorithm reduces to the one for ordinary FA when the sources become gaussian, and to an EM algorithm for PCA in the zero-noise limit. We derive an additional EM algorithm specifically for noiseless IFA. This algorithm is shown to be superior to ICA since it can learn arbitrary source densities from the data. Beyond blind separation, IFA can be used for modeling multidimensional data by a highly constrained mixture of gaussians and as a tool for nonlinear signal encoding.
Several recent papers (Gardner and Derrida 1989; Györgyi 1990; Sompolinsky et al. 1990) have found, using methods of statistical physics, that a transition to perfect generalization occurs in training a simple perceptron whose weights can only take values ±1. We give a rigorous proof of such a phenomena. That is, we show, for α = 2.0821, that if at least αn examples are drawn from the uniform distribution on {+1, −1}n and classified according to a target perceptron wt ∈ {+1, −1}n as positive or negative according to whether wt·x is nonnegative or negative, then the probability is 2−(√n) that there is any other such perceptron consistent with the examples. Numerical results indicate further that perfect generalization holds for α as low as 1.5.
- 1
- 2
- 3
- 4
- 5
- 6
- 8