Stochastic Gradient Algorithm Under (h,φ)-Entropy Criterion

Circuits, Systems, and Signal Processing - Tập 26 - Trang 941-960 - 2008
B. Chen1, J. Hu1, L. Pu1, Z. Sun1
1Department of Computer Science and Technology, Tsinghua University, Beijing, People’s Republic of China

Tóm tắt

Motivated by the work of Erdogmus and Principe, we use the error (h,φ)-entropy as the supervised adaptation criterion. Several properties of the (h,φ)-entropy criterion and the connections with traditional error criteria are investigated. By a kernel estimate approach, we obtain the nonparametric estimator of the instantaneous (h,φ)-entropy. Then, we develop the general stochastic information gradient algorithm, and derive the approximate upper bound for the step size in the adaptive linear neuron training. Moreover, the (h,φ) pair are optimized to improve the performance of the proposed algorithm. For the finite impulse response identification with white Gaussian input and noise, the exact optimum φ function is derived. Finally, simulation experiments verify the results and demonstrate the noticeable performance improvement that may be achieved by the optimum (h,φ)-entropy criterion.

Tài liệu tham khảo

Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991) Douglas, S.C., Meng, H.Y.: Stochastic gradient adaptation under general error criteria. IEEE Trans. Signal Process. 42, 1335–1351 (1994) Erdogmus, D., Principe, J.C.: Comparison of entropy and mean square error criteria in adaptive system training using higher order statistics. In: Second International Workshop on Independent Component Analysis and Blind Signal Separation, pp. 75–80 (2000) Erdogmus, D., Principe, J.C.: Generalized information potential criterion for adaptive system training. IEEE Trans. Neural Netw. 13, 1035–1044 (2002) Erdogmus, D., Principe, J.C.: Convergence properties and data efficiency of the minimum error entropy criterion in Adaline training. IEEE Trans. Signal Process. 51, 1966–1978 (2003) Erdogmus, D., Kenneth, E.H., Principe, J.C.: Online entropy manipulation: stochastic information gradient. IEEE Signal Process. Lett. 10, 242–245 (2003) Feng, X., Loparo, K.A., Fang, Y.: Optimal state estimation for stochastic systems: an information theoretic approach. IEEE Trans. Autom. Control 42(6), 771–785 (1997) Gibson, J.D., Gray, S.D.: MVSE adaptive filtering subject to a constraint on MSE. IEEE Trans. Circuits Syst. 35(5), 603–608 (1988). May Haykin, S.: Adaptive Filtering Theory, 3rd edn. Prentice-Hall, Upper Saddle River (1996) Kaplan, D., Glass, L.: Understanding Nonlinear Dynamics. Springer, New York (1995) Lo, J.T., Wanner, T.: Existence and uniqueness of risk-sensitivity estimates. IEEE Trans. Autom. Control 47(11), 1945–1948 (2002) Menendez, M.L., Pardo, J.A., Pardo, M.C.: Estimators based on sample quantiles using (h,φ)-entropy measures. Appl. Math. Lett. 11, 99–104 (1998) Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, Boca Raton (2006) Salicru, M., Menendez, M.L., Morales, D., Pardo, L.: Asymptotic distribution of (h,φ)-entropies. Commun. Stat. Theory Methods 22, 2015–2031 (1993) Sherman, S.: Non-mean-square error criteria. IRE Trans. Inf. Theory IT-4, 125–126 (1958). September Silverman, B.W.: Density Estimation for Statistic and Data Analysis. Chapman & Hall, New York (1986) Walach, E., Widrow, B.: The least mean fourth (LMF) adaptive algorithm and its family. IEEE Trans. Inf. Theory IT-30(2), 275–283 (1984). March