Divergence measures for statistical data processing—An annotated bibliography

Signal Processing - Tập 93 - Trang 621-633 - 2013
Michèle Basseville1
1IRISA, Campus de Beaulieu, 35042 RENNES Cedex, France

Tài liệu tham khảo

J. Aczél, Lectures on Functional Equations and Their Applications, Mathematics in Science and Engineering, vol. 19, Academic Press, 1966. Aczél, 1984, Measuring information beyond communication theory—Why some generalized information measures may be useful, others not, Aequationes Mathematicae, 27, 1, 10.1007/BF02192655 J. Aczél, Z. Daròczy, On Measures of Information and Their Characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, 1975. Agarwal, 2010, A geometric view of conjugate priors, Machine Learning, 81, 99, 10.1007/s10994-010-5203-x Akaike, 1974, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716, 10.1109/TAC.1974.1100705 Ali, 1966, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society—Series B Methodological, 28, 131, 10.1111/j.2517-6161.1966.tb00626.x Altun, 2006, Unifying divergence minimization and statistical inference via convex duality, vol. 4005, 139 S.-I. Amari, Differential–Geometrical Methods in Statistics, Lecture Notes In Statistics, vol. 28, Springer-Verlag, New York, NY, USA, 1985. Amari, 2001, Information geometry on hierarchy of probability distributions, IEEE Transactions on Information Theory, 47, 1701, 10.1109/18.930911 Amari, 2007, Integration of stochastic models by minimizing α-divergence, Neural Computation, 19, 2780, 10.1162/neco.2007.19.10.2780 Amari, 2009, α-divergence is unique belonging to both f-divergence and Bregman divergence classes, IEEE Transactions on Information Theory, 55, 4925, 10.1109/TIT.2009.2030485 S.-I. Amari, Information geometry and its applications: convex function and dually flat manifold, in: Emerging Trends in Visual Computing - LIX Colloquium, November 2008, Lecture Notes in Computer Science, vol. 5416, Springer-Verlag, 2009, pp. 75–102. S.-I. Amari, Information geometry derived from divergence functions, in: 3rd International Symposium on Information Geometry and its Applications, Leipzig, FRG, August 2–6, 2010. Amari, 2000, vol. 191 Anantharam, 1990, A large deviations approach to error exponents in source coding and hypothesis testing, IEEE Transactions on Information Theory, 36, 938, 10.1109/18.53762 Arikan, 1996, An inequality on guessing and its application to sequential decoding, IEEE Transactions on Information Theory, 42, 99, 10.1109/18.481781 Arimoto, 1971, Information-theoretical considerations on estimation problems, Information and Control, 19, 181, 10.1016/S0019-9958(71)90065-9 S. Arimoto, Information measures and capacity of order α for discrete memoryless channels, in: Topics in Information Theory—2nd Colloquium, Keszthely, HU, 1975, Colloquia Mathematica Societatis János Bolyai, vol. 16, North Holland, Amsterdam, NL, 1977, pp. 41–52. Arsigny, 2007, Geometric means in a novel vector space structure on symmetric positive-definite matrices, SIAM Journal on Matrix Analysis and Applications, 29, 328, 10.1137/050637996 K.A. Arwini, C.T.J. Dodson, Information Geometry - Near Randomness and Near Independence, Lecture Notes in Mathematics, vol. 1953, Springer, 2008. J.A. Aslam, V. Pavlu, Query hardness estimation using Jensen–Shannon divergence among multiple scoring functions, in: G. Amati, C. Carpineto, G. Romano (Eds.), Advances in Information Retrieval—29th European Conference on IR Research, ECIR'07, Rome, Italy, Lecture Notes in Computer Science, vol. 4425, Springer-Verlag, Berlin Heidelberg, FRG, April 2–5, 2007, pp. 198–209. Aviyente, 2004, Characterization of event related potentials using information theoretic distance measures, IEEE Transactions on Biomedical Engineering, 51, 737, 10.1109/TBME.2004.824133 Bahr, 1990, Asymptotic analysis of error probabilities for the nonzero-mean Gaussian hypothesis testing problem, IEEE Transactions on Information Theory, 36, 597, 10.1109/18.54905 A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, An information theoretic analysis of maximum likelihood mixture estimation for exponential families, in: C.E. Brodley (Ed.), Proceedings of the 21st International Conference on Machine Learning (ICML'04), Banff, Alberta, Canada, ACM International Conference Proceeding Series, vol. 69, New York, NY, USA, July 4–8, 2004. Banerjee, 2007, A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, Journal of Machine Learning Research, 8, 1919 Banerjee, 2005, Clustering with Bregman divergences, Journal of Machine Learning Research, 6, 1705 Barndorff-Nielsen, 1986, The role of differential geometry in statistical theory, International Statistical Review, 54, 83, 10.2307/1403260 Basseville, 1989, Distance measures for signal processing and pattern recognition, Signal Processing, 18, 349, 10.1016/0165-1684(89)90079-0 M. Basseville, Information: entropies, divergences et moyennes. Research Report 1020, IRISA, 〈hal.archives-ouvertes.fr/inria-00490399/〉, May 1996 (in French). Basseville, 1997, Information criteria for residual generation and fault detection and isolation, Automatica, 33, 783, 10.1016/S0005-1098(97)00004-6 M. Basseville, J.-F. Cardoso, On entropies, divergences, and mean values, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'95), Whistler, British Columbia, Canada, September 1995, p. 330. Basu, 1998, Robust and efficient estimation by minimising a density power divergence, Biometrika, 85, 549, 10.1093/biomet/85.3.549 Basu, 1994, Minimum disparity estimation for continuous models, Annals of the Institute of Statistical Mathematics, 46, 683, 10.1007/BF00773476 Basu, 2004, The iteratively reweighted estimating equation in minimum distance problems, Computational Statistics and Data Analysis, 45, 105, 10.1016/S0167-9473(02)00326-2 Basu, 2011 Bauschke, 1983, Duality for Bregman projections onto translated cones and affine subspaces, Journal of Approximation Theory, 121, 1 Bekara, 2006, A model selection approach to signal denoising using Kullback's symmetric divergence, Signal Processing, 86, 1400, 10.1016/j.sigpro.2005.03.023 Ben-Tal, 1989, Entropic means, Journal of Mathematical Analysis and Applications, 139, 537, 10.1016/0022-247X(89)90128-5 Bercher, 2008, On some entropy functionals derived from Rényi information divergence, Information Sciences, 178, 2489, 10.1016/j.ins.2008.02.003 Bhattacharyya, 1943, On a measure of divergence between two statistical populations defined by their probability distributions, Bulletin of the Calcutta Mathematical Society, 35, 99 Birgé, 2005, A new lower bound for multiple hypothesis testing, IEEE Transactions on Information Theory, 51, 1611, 10.1109/TIT.2005.844101 Blahut, 1974, Hypothesis testing and information theory, IEEE Transactions on Information Theory, 20, 405, 10.1109/TIT.1974.1055254 Blahut, 1987 J. Boets, K. De Cock, B. De Moor, A mutual information based distance for multivariate Gaussian processes, in: A. Chiuso, A. Ferrante, S. Pinzoni (Eds.), Modeling, Estimation and Control, Festschrift in Honor of Giorgio Picci on the Occasion of his Sixty-Fifth Birthday, Lecture Notes in Control and Information Sciences, vol. 364, Springer-Verlag, Berlin, FRG, October 2007, pp. 15–33. Bougerol, 1993, Kalman filtering with random coefficients and contraction, SIAM Journal on Control and Optimization, 31, 942, 10.1137/0331041 Bregman, 1967, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7, 200, 10.1016/0041-5553(67)90040-7 Broniatowski, 2006, Minimization of φ-divergences on sets of signed measures, Studia Scientiarum Mathematicarum Hungarica, 43, 403, 10.1556/SScMath.43.2006.4.2 Broniatowski, 2009, Parametric estimation and tests through divergences and the duality technique, Journal of Multivariate Analysis, 100, 16, 10.1016/j.jmva.2008.03.011 Broniatowski, 2012, Divergences and duality for estimation and test under moment condition models, Journal of Statistical Planning and Inference, 142, 2554, 10.1016/j.jspi.2012.03.013 M. Broniatowski, I. Vajda, Several applications of divergence criteria in continuous families. Kybernetika 48, arXiv:0911.0937, in press. Burbea, 1982, Entropy differential metric, distance and divergence measures in probability spaces, Journal of Multivariate Analysis, 12, 575, 10.1016/0047-259X(82)90065-3 Burbea, 1982, On the convexity of higher order Jensen differences based on entropy functions, IEEE Transactions on Information Theory, 28, 961, 10.1109/TIT.1982.1056573 Burbea, 1982, On the convexity of some divergence measures based on entropy functions, IEEE Transactions on Information Theory, 28, 489, 10.1109/TIT.1982.1056497 Burg, 1982, Estimation of structured covariance matrices, Proceedings of the IEEE, 70, 963, 10.1109/PROC.1982.12427 Byrnes, 2001, A generalized entropy criterion for Nevanlinna–Pick interpolation with degree constraint, IEEE Transactions on Automatic Control, 46, 822, 10.1109/9.928584 M.A. Carreira-Perpiñán, G.E. Hinton, On contrastive divergence learning, in: R. Cowell, Z. Ghahramani (Eds.), Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS'05), Barbados, UK, January 6–8, 2005, pp. 59–66. L. Cayton, Fast nearest neighbor retrieval for Bregman divergences, in: W.W. Cohen, A. McCallum, S.T. Roweis (Eds.), Proceedings of the 25th International Conference on Machine Learning (ICML'08), Helsinki, Finland, June 2008, pp. 112–119. L. Cayton, Efficient Bregman range search, in: Y. Bengio, D. Schuurmans, J. Lafferty, C.K.I. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems 22, Vancouver, British Columbia, Canada, NIPS Foundation, December 7–10, 2009, pp. 243–251. Chernoff, 1952, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics, 23, 493, 10.1214/aoms/1177729330 Cichocki, 2010, Families of alpha- beta- and gamma-divergences, Entropy, 12, 1532, 10.3390/e12061532 A. Cichocki, R. Zdunek, S.-I. Amari, Csiszár's divergences for non-negative matrix factorization: family of new multiplicative algorithm, in: J.P. Rosca, D. Erdogmus, J.C. Príncipe, S. Haykin (Eds.), Proceedings of the 6th International Conference on Independent Component Analysis and Blind Source Separation (ICA'06), Charleston, South Carolina, USA, Lecture Notes in Computer Science, vol. 3889, Springer-Verlag, Berlin Heidelberg, FRG, March 5–8, 2006, pp. 32–39. Cichocki, 2008, Nonnegative matrix and tensor factorization, IEEE Signal Processing Magazine, 25, 142, 10.1109/MSP.2008.4408452 Cichocki, 2009 Collins, 2002, Logistic regression, AdaBoost and Bregman distances, Machine Learning, 48, 253, 10.1023/A:1013912006537 Coursol, 1979, Sur la formule de Chernoff pour deux processus Gaussiens stationnaires, Comptes Rendus Hebdomadaires des Séances de l'Académie des Sciences, 288, 769 Cover, 1991, 10.1002/0471200611 Cover, 2006 Csiszár, 1963, Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten, Magyar Tudományos Akadémia Matematikai Kutató Intezetenek Kozlemenyei, 8, 85 Csiszár, 1967, Information-type measures of difference of probability distributions and indirect observation, Studia Scientiarum Mathematicarum Hungarica, 2, 299 Csiszár, 1967, On topological properties of f-divergence, Studia Scientiarum Mathematicarum Hungarica, 2, 329 Csiszár, 1975, I-divergence geometry of probability distributions and minimization problems, Annals of Probability, 3, 146, 10.1214/aop/1176996454 I. Csiszár, Information measures: a critical survey, in: J. Kozesnik (Ed.), Transactions of the 7th Conference on Information Theory, Statistical Decision Functions, Random Processes, Prague, vol. B, Academia, Prague, August 18–23, 1974, pp. 73–86. Csiszár, 1991, Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems, Annals of Statistics, 19, 2032, 10.1214/aos/1176348385 Csiszár, 1995, Generalized cutoff rates and Renyi's information measures, IEEE Transactions on Information Theory, 41, 26, 10.1109/18.370121 Csiszár, 1995, Generalized projections for non-negative functions, Acta Mathematica Hungarica, 68, 161, 10.1007/BF01874442 Csiszár, 2008, Axiomatic characterizations of information measures, Entropy, 10, 261, 10.3390/e10030261 Csiszár, 2003, Information projections revisited, IEEE Transactions on Information Theory, 49, 1474, 10.1109/TIT.2003.810633 I. Csiszár, F. Matus, On minimization of multivariate entropy functionals, in: V. Anantharam, I. Kontoyiannis (Eds.), Proceedings of the IEEE Information Theory Workshop on Networking and Information Theory (ITW'09), Volos, Greece, June 10–12, 2009, pp. 96–100. I. Csiszár, F. Matus, Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities. ArXiv:1202.0666, February 2012. M. Das Gupta, T.S. Huang, Bregman distance to l1 regularized logistic regression. ArXiv:1004.3814, April 2010. S. Della Pietra, V. Della Pietra, J. Lafferty, Duality and Auxiliary Functions for Bregman Distances, Technical Report Collection CMU-CS-01-109R, School of Computer Science, Carnegie Mellon University, February 2002. Dembo, 1997, Information inequalities and concentration of measure, Annals of Probability, 25, 927, 10.1214/aop/1024404424 Dembo, 1991, Information theoretic inequalities, IEEE Transactions on Information Theory, 37, 1501, 10.1109/18.104312 Dembo, 1998, vol. 38 Devroye, 1996, vol. 31 Dhillon, 2003, A divisive information-theoretic feature clustering algorithm for text classification, Journal of Machine Learning Research, 3, 1265 Dhillon, 2006, Generalized nonnegative matrix approximations with Bregman divergences, 283 Dhillon, 2008, Matrix nearness problems with Bregman divergences, SIAM Journal on Matrix Analysis and Applications, 29, 1120, 10.1137/060649021 Donoho, 2004, When does non-negative matrix factorization give a correct decomposition into parts? Donsker, 1975, Asymptotic evaluation of certain Markov process expectations for large time, II, Communications on Pure and Applied Mathematics, 28, 279, 10.1002/cpa.3160280206 Dryden, 2009, Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging, Annals of Applied Statistics, 3, 1102, 10.1214/09-AOAS249 Eguchi, 2010, Entropy and divergence associated with power function and the statistical application, Entropy, 12, 262, 10.3390/e12020262 Endres, 2003, A new metric for probability distributions, IEEE Transactions on Information Theory, 49, 1858, 10.1109/TIT.2003.813506 Esteban, 1997, A general class of entropy statistics, Applications of Mathematics, 42, 161, 10.1023/A:1022447020419 Fedotov, 2003, Refinements of Pinsker's inequality, IEEE Transactions on Information Theory, 49, 1491, 10.1109/TIT.2003.811927 Ferrante, 2008, Hellinger versus Kullback–Leibler multivariable spectrum approximation, IEEE Transactions on Automatic Control, 53, 954, 10.1109/TAC.2008.920238 Ferrari, 2010, Maximum Lq-likelihood estimation, Annals of Statistics, 38, 753, 10.1214/09-AOS687 Finesso, 2006, Nonnegative matrix factorization and I-divergence alternating minimization, Linear Algebra and its Applications, 416, 270, 10.1016/j.laa.2005.11.012 Fischer, 2010, Quantization and clustering with Bregman divergences, Journal of Multivariate Analysis, 101, 2207, 10.1016/j.jmva.2010.05.008 Frigyik, 2008, Functional Bregman divergence and Bayesian estimation of distributions, IEEE Transactions on Information Theory, 54, 5130, 10.1109/TIT.2008.929943 Fujimoto, 2007, A modified EM algorithm for mixture models based on Bregman divergence, Annals of the Institute of Statistical Mathematics, 59, 3, 10.1007/s10463-006-0097-x Fé, 2009, Nonnegative matrix factorization with the Itakura–Saito divergence. With application to music analysis, Neural Computation, 21, 793 Févotte, 2011, Algorithms for nonnegative matrix factorization with the β-divergence, Neural Computation, 23, 2421, 10.1162/NECO_a_00168 Georgiou, 2006, Relative entropy and the multivariable multidimensional moment problem, IEEE Transactions on Information Theory, 52, 1052, 10.1109/TIT.2005.864422 Georgiou, 2007, Distances and Riemannian metrics for spectral density functions, IEEE Transactions on Signal Processing, 55, 3995, 10.1109/TSP.2007.896119 Georgiou, 2009, Metrics for power spectra, IEEE Transactions on Signal Processing, 57, 859, 10.1109/TSP.2008.2010009 Georgiou, 2003, Kullback–Leibler approximation of spectral density functions, IEEE Transactions on Information Theory, 49, 2910, 10.1109/TIT.2003.819324 Georgiou, 2008, A convex optimization approach to ARMA modeling, IEEE Transactions on Automatic Control, 53, 1108, 10.1109/TAC.2008.923684 2010 Gilardoni, 2010, On Pinsker's and Vajda's type inequalities for Csiszár's f-divergences, IEEE Transactions on Information Theory, 56, 5377, 10.1109/TIT.2010.2068710 Gray, 1976, Distance measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 380, 10.1109/TASSP.1976.1162849 R.M. Gray, Entropy and Information Theory, Springer-Verlag, New York, NY, USA, 1990, online corrected version, 2009, 〈http://ee.stanford.edu/gray/it.html〉. Gray, 2010 Gray, 1980, Distortion measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 367, 10.1109/TASSP.1980.1163421 Grünwald, 2004, Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory, Annals of Statistics, 32, 1367, 10.1214/009053604000000553 Guntuboyina, 2011, Lower bounds for the minimax risk using f-divergences and applications, IEEE Transactions on Information Theory, 57, 2386, 10.1109/TIT.2011.2110791 Györfi, 1978, f-Dissimilarity, Annals of the Institute of Statistical Mathematics, 30, 105, 10.1007/BF02480206 P. Harremoës, I. Vajda, On Bahadur efficiency of power divergence statistics. ArXiv:1002.1493, February 2010. Harremoës, 2011, On pairs of f-divergences and their joint range, IEEE Transactions on Information Theory, 57, 3230, 10.1109/TIT.2011.2137353 P. Harremoës, C. Vignat, Rényi entropies of projections, in: A. Barg, R.W. Yeung (Eds.), Proceedings of the IEEE International Symposium on Information Theory (ISIT'06), Seattle, WA, USA, July 9–14, 2006, pp. 1827–1830. Havrda, 1967, Quantification method of classification processes, Kybernetika, 3, 30 He, 2003, A generalized divergence measure for robust image registration, IEEE Transactions on Signal Processing, 51, 1211 A.O. Hero, B. Ma, O. Michel, J. Gorman, Alpha-divergence for Classification, Indexing and Retrieval, Research Report CSPL-328, University of Michigan, Communications and Signal Processing Laboratory, May 2001. Hinton, 2002, Training products of experts by minimizing contrastive divergence, Neural Computation, 14, 1771, 10.1162/089976602760128018 Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527, 10.1162/neco.2006.18.7.1527 Hoeffding, 1965, Asymptotically optimal tests for multinomial distributions, Annals of Mathematical Statistics, 36, 369, 10.1214/aoms/1177700150 Hyvárinen, 2005, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research, 6, 695 Hyvárinen, 2007, Some extensions of score matching, Computational Statistics and Data Analysis, 51, 2499, 10.1016/j.csda.2006.09.003 James, 1961, Estimation with quadratic loss, vol. 1, 361 Jiang, 2012, Geometric methods for spectral analysis, IEEE Transactions on Signal Processing, 60, 1064, 10.1109/TSP.2011.2178601 Jiang, 2012, Distances and Riemannian metrics for multivariate spectral densities, IEEE Transactions on Automatic Control, 57, 1723, 10.1109/TAC.2012.2183171 Johnson, 2004, Fisher information inequalities and the central limit theorem, Probability Theory and Related Fields, 129, 391, 10.1007/s00440-004-0344-0 Johnson, 1979, Axiomatic characterization of the directed divergences and their linear combinations, IEEE Transactions on Information Theory, 25, 709, 10.1109/TIT.1979.1056113 Jones, 1990, General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis, IEEE Transactions on Information Theory, 36, 23, 10.1109/18.50370 Jones, 2001, A comparison of related density-based minimum divergence estimators, Biometrika, 88, 865, 10.1093/biomet/88.3.865 Kagan, 2008, Some inequalities related to the Stam inequality, Applications of Mathematics, 53, 195, 10.1007/s10492-008-0004-2 T. Kanamori, A. Ohara, A Bregman extension of quasi-Newton updates II: convergence and robustness properties. ArXiv:1010.2846, October 2010. T. Kanamori, A. Ohara, A Bregman extension of quasi-Newton updates I: an information geometrical framework, Optimization Methods and Software 27, doi:10.1080/10556788.2011.613073, in press. Kanamori, 2012, f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models, IEEE Transactions on Information Theory, 58, 708, 10.1109/TIT.2011.2163380 Karagrigoriou, 2010, Measures of divergence in model selection, 51 Karagrigoriou, 2008, On measures of information and divergence and model selection criteria, 503 Karlsson, 2010, The inverse problem of analytic interpolation with degree constraint and weight selection for control synthesis, IEEE Transactions on Automatic Control, 55, 405, 10.1109/TAC.2009.2037280 Kass, 1997 Kazakos, 1980, On resolution and exponential discrimination between Gaussian stationary vector processes and dynamic models, IEEE Transactions on Automatic Control, 25, 294, 10.1109/TAC.1980.1102275 Kazakos, 1982, Spectral distance measures between continuous-time vector Gaussian processes, IEEE Transactions on Information Theory, 28, 679, 10.1109/TIT.1982.1056521 Kazakos, 1980, Spectral distance measures between Gaussian processes, IEEE Transactions on Automatic Control, 25, 950, 10.1109/TAC.1980.1102475 Kazakos, 1990 Kim, 2008, Estimation of a tail index based on minimum density power divergence, Journal of Multivariate Analysis, 99, 2453, 10.1016/j.jmva.2008.02.031 J. Kivinen, M.K. Warmuth, Boosting as entropy projection, in: Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT'99), Santa Cruz, CA, USA, ACM, July 7–9, 1999, pp. 134–144. Kivinen, 2006, The p-norm generalization of the LMS algorithm for adaptive filtering, IEEE Transactions on Signal Processing, 54, 1782, 10.1109/TSP.2006.872551 Knockaert, 1993, A class of statistical and spectral distance measures based on Bose–Einstein statistics, IEEE Transactions on Signal Processing, 41, 3171, 10.1109/78.257248 L. Knockaert, Statistical thermodynamics and natural f-divergences. unpublished paper 〈users.ugent.be/lknockae/〉, 1994. Knockaert, 2003, On scale and concentration invariance in entropies, Information Sciences, 152, 139, 10.1016/S0020-0255(03)00058-6 Kompass, 2007, A generalized divergence measure for nonnegative matrix factorization, Neural Computation, 19, 780, 10.1162/neco.2007.19.3.780 Kulis, 2009, Low-rank kernel learning with Bregman matrix divergences, Journal of Machine Learning Research, 10, 341 S. Kullback, J.C. Keegel, J.H. Kullback, Topics in Statistical Information Theory, Lecture Notes in Statistics, vol. 42, Springer-Verlag, New York, NY, USA, 1987. J.D. Lafferty, Statistical learning algorithms based on Bregman distances, in: Proceedings of the Canadian Workshop on Information Theory, Toronto, Canada, June 3–6, 1997, pp. 77–80. J.D. Lafferty, Additive models, boosting, and inference for generalized divergences, in: Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT'99), Santa Cruz, CA, USA, ACM, July 7–9, 1999, pp. 125–133. Lawson, 2007, A Birkhoff contraction formula with application to Riccati equations, SIAM Journal on Control and Optimization, 46, 930, 10.1137/050637637 Le Besnerais, 1999, A new look at entropy for solving linear inverse problems, IEEE Transactions on Information Theory, 45, 1565, 10.1109/18.771159 G. Lebanon, J. Lafferty, Boosting and maximum likelihood for exponential models, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14, Vancouver, British Columbia, Canada, MIT Press, Cambridge, MA, December 3–8, 2001. Lee, 2008, Invariant metrics, contractions and nonlinear matrix equations, Nonlinearity, 21, 857, 10.1088/0951-7715/21/4/011 A. Lefevre, F. Bach, C. Fevotte, Online algorithms for nonnegative matrix factorization with the Itakura–Saito divergence, in: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA'11), New Paltz, NY, USA, October 16–19, 2011, pp. 313–316. Leonenko, 2010, Statistical inference for the ϵ-entropy and the quadratic Rényi entropy, Journal of Multivariate Analysis, 101, 1981, 10.1016/j.jmva.2010.05.009 Levy, 2004, Robust least-squares estimation with a relative entropy constraint, IEEE Transactions on Information Theory, 50, 89, 10.1109/TIT.2003.821992 Li, 2009, Effective metric for detecting distributed denial-of-service attacks based on information divergence, IET Communications, 3, 1851, 10.1049/iet-com.2008.0586 F. Liese, I. Vajda, Convex Statistical Distances, Texte zur Mathematick, vol. 95, Teubner, Leipzig, 1987. Liese, 2006, On divergences and informations in statistics and information theory, IEEE Transactions on Information Theory, 52, 4394, 10.1109/TIT.2006.881731 Lin, 1991, Divergence measures based on the Shannon entropy, IEEE Transactions on Information Theory, 37, 145, 10.1109/18.61115 Lindsay, 1994, Efficiency versus robustness, Annals of Statistics, 22, 1081, 10.1214/aos/1176325512 Lutwak, 2005, Cramér–Rao and moment-entropy inequalities for Rényi entropy and generalized Fisher information, IEEE Transactions on Information Theory, 51, 473, 10.1109/TIT.2004.840871 Ma, 2011, Fixed point and Bregman iterative methods for matrix rank minimization, Mathematical Programming, Series A, 128, 321, 10.1007/s10107-009-0306-5 MacKay, 2003 Maji, 2009, f-Information measures for efficient selection of discriminative genes from microarray data, IEEE Transactions on Biomedical Engineering, 56, 1063, 10.1109/TBME.2008.2004502 Maji, 2010, Feature selection using f-information measures in fuzzy approximation spaces, IEEE Transactions on Knowledge and Data Engineering, 22, 854, 10.1109/TKDE.2009.124 Mantalos, 2010, An improved divergence information criterion for the determination of the order of an AR process, Communications in Statistics—Simulation and Computation, 39, 865, 10.1080/03610911003650391 Markatou, 1998, Weighted likelihood equations with bootstrap root search, Journal of the American Statistical Association, 93, 740, 10.1080/01621459.1998.10473726 Martín, 2011, A new class of minimum power divergence estimators with applications to cancer surveillance, Journal of Multivariate Analysis, 102, 1175, 10.1016/j.jmva.2011.03.011 Mathai, 1975 Y. Matsuyama, Non-logarithmic information measures, α-weighted EM algorithms and speedup of learning, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'98), Cambridge, MA, USA, August 16–21, 1998, p. 385. Y. Matsuyama, The α-EM algorithm and its applications, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'00), vol. 1, Istanbul, Turkey, June 5–9, 2000, pp. 592–595. Matsuyama, 2003, The α-EM algorithm, IEEE Transactions on Information Theory, 49, 692, 10.1109/TIT.2002.808105 Y. Matsuyama, N. Katsumata, S. Imahara, Convex divergence as a surrogate function for independence: the f-divergence, in: T.-W, Lee, T.-P. Jung, S. Makeig, T.J. Sejnowski, (Eds.), Proceedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation, San Diego, CA, USA, December 2001, pp. 31–36. Mattheou, 2009, A model selection criterion based on the BHHJ measure of divergence, Journal of Statistical Planning and Inference, 139, 228, 10.1016/j.jspi.2008.04.022 Matus, 2009, Divergence from factorizable distributions and matroid representations by partitions, IEEE Transactions on Information Theory, 55, 5375, 10.1109/TIT.2009.2032806 Matusita, 1973, Discrimination and the affinity of distributions, 213 Merhav, 2011, Data processing theorems and the second law of thermodynamics, IEEE Transactions on Information Theory, 57, 4926, 10.1109/TIT.2011.2159052 Minami, 2002, Robust blind source separation by beta divergence, Neural Computation, 14, 1859 T. Minka, Divergence Measures and Message Passing, Technical Report MSR-TR-2005-173, Microsoft Research Ltd, 2005. A. Mnih, G. Hinton, Learning nonlinear constraints with contrastive backpropagation, in: D.V. Prokhorov (Ed.), Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'05), vol. 2, Montréal, Québec, Canada, July 31–August 4, 2005, pp. 1302–1307. Moakher, 2006, Symmetric positive-definite matrices, vol. 17, 285 Mollah, 2006, Exploring latent structure of mixture ICA models by the minimum β-divergence method, Neural Computation, 18, 166, 10.1162/089976606774841549 Morimoto, 1963, Markov processes and the H-theorem, Journal of the Physical Society of Japan, 18, 328, 10.1143/JPSJ.18.328 Murata, 2004, Information geometry of U-Boost and Bregman divergence, Neural Computation, 16, 1437, 10.1162/089976604323057452 Nascimento, 2010, Hypothesis testing in speckled data with stochastic distances, IEEE Transactions on Geoscience and Remote Sensing, 48, 373, 10.1109/TGRS.2009.2025498 Nason, 2001, Robust projection indices, Journal of the Royal Statistical Society—Series B Methodological, 63, 551, 10.1111/1467-9868.00298 Natarajan, 1985, Large deviations, hypotheses testing, and source coding for finite Markov chains, IEEE Transactions on Information Theory, 31, 360, 10.1109/TIT.1985.1057036 Nath, 1975, On a coding theorem connected with Rényi's entropy, Information and Control, 29, 234, 10.1016/S0019-9958(75)90404-0 Nguyen, 2009, On surrogate loss functions and f-divergences, Annals of Statistics, 37, 876, 10.1214/08-AOS595 Nguyen, 2010, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory, 56, 5847, 10.1109/TIT.2010.2068870 Nielsen, 2011, The Burbea–Rao and Bhattacharyya centroids, IEEE Transactions on Information Theory, 57, 5455, 10.1109/TIT.2011.2159046 Nielsen, 2009, Sided and symmetrized Bregman centroids, IEEE Transactions on Information Theory, 55, 2882, 10.1109/TIT.2009.2018176 F. Nielsen, P. Piro, M. Barlaud, Bregman vantage point trees for efficient nearest neighbor queries, in: Q. Sun, Y. Rui (Eds.), Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'09), New York, NY, USA, June 28–July 3, 2009, pp. 878–881. Nishimura, 2008, The information geometric structure of generalized empirical likelihood estimators, Communications in Statistics—Theory and Methods, 37, 1867, 10.1080/03610920801893657 Nock, 2009, Bregman divergences and surrogates for learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2048, 10.1109/TPAMI.2008.225 Pardo, 2006 Pardo, 1995, Divergence measures based on entropy functions and statistical inference, Sankhyā, 57, 315 Pardo, 2003, On asymptotic properties of information-theoretic divergences, IEEE Transactions on Information Theory, 49, 1860, 10.1109/TIT.2003.813509 Patra, 2008, Minimum Hellinger distance estimation with inlier modification, Sankhyā, 70, 310 Pavon, 2006, On the Georgiou–Lindquist approach to constrained Kullback–Leibler approximation of spectral densities, IEEE Transactions on Automatic Control, 51, 639, 10.1109/TAC.2006.872755 M. Pavon, A. Ferrante, On the geometry of maximum entropy problems. ArXiv:1112.5529, December 2011. Pelletier, 2005, Informative barycentres in statistics, Annals of the Institute of Statistical Mathematics, 57, 767, 10.1007/BF02915437 Pelletier, 2011, Inference in ϕ-families of distributions, Statistics—A Journal of Theoretical and Applied Statistics, 45, 223 Perez, 1984, Barycenter of a set of probability measures and its application in statistical decision, 154 Petz, 1996, Monotone metrics on matrix spaces, Linear Algebra and its Applications, 244, 81, 10.1016/0024-3795(94)00211-8 Petz, 2005, Means of positive numbers and matrices, SIAM Journal on Matrix Analysis and Applications, 27, 712, 10.1137/050621906 Pham, 2008, On the risk of using Rényi's entropy for blind source separation, IEEE Transactions on Signal Processing, 56, 4611, 10.1109/TSP.2008.928109 Pluim, 2004, f-Information measures in medical image registration, IEEE Transactions on Medical Imaging, 23, 1508, 10.1109/TMI.2004.836872 B. Poczos, L. Xiong, J. Schneider, Nonparametric divergence estimation with applications to machine learning on distributions. ArXiv:1202.3758, February 2012. Principe, 2008 Qiao, 2010, A study on invariance of f-divergence and its application to speech recognition, IEEE Transactions on Signal Processing, 58, 3884, 10.1109/TSP.2010.2047340 Ramponi, 2009, A globally convergent matricial algorithm for multivariate spectral estimation, IEEE Transactions on Automatic Control, 54, 2376, 10.1109/TAC.2009.2028977 Rao, 1945, Information and accuracy attainable in the estimation of statistical parameters, Bulletin of the Calcutta Mathematical Society, 37, 81 Rao, 1982, Diversity and dissimilarity coefficients, Theoretical Population Biology, 21, 24, 10.1016/0040-5809(82)90004-1 Rao, 1982, Diversity, Sankhyā, 44, 1 Rao, 1986, Rao's axiomatization of diversity measures, vol. 7, 614 Rao, 1987, Differential metrics in probability spaces, vol. 10, 217 Rao, 1985, Cross entropy, dissimilarity measures, and characterizations of quadratic entropy, IEEE Transactions on Information Theory, 31, 589, 10.1109/TIT.1985.1057082 Rauh, 2011, Finding the maximizers of the information divergence from an exponential family, IEEE Transactions on Information Theory, 57, 3236, 10.1109/TIT.2011.2136230 Ravikumar, 2010, Message-passing for graph-structured linear programs, Journal of Machine Learning Research, 11, 1043 Read, 1988 Reid, 2010, Composite binary losses, Journal of Machine Learning Research, 11, 2387 Reid, 2011, Information, divergence and risk for binary experiments, Journal of Machine Learning Research, 12, 731 Rényi, 1961, On measures of information and entropy, vol. 1, 547 Rényi, 1967, On some basic problems of statistics from the point of view of information theory, vol. 1, 531 A. Roman, S. Jolad, M.C. Shastry, Bounded divergence measures based on Bhattacharyya coefficient. ArXiv:1201.0418, January 2012. Sander, 2002, Measures of information, vol. 2, 1523 R. Santos-Rodriguez, D. Garcia-Garcia, J. Cid-Sueiro, Cost-sensitive classification based on Bregman divergences for medical diagnosis, in: M.A. Wani (Ed.), Proceedings of the 8th International Conference on Machine Learning and Applications (ICMLA'09), Miami Beach, FL, USA, December 13–15, 2009, pp. 551–556. M.P. Schützenberger, Contribution aux applications statistiques de la théorie de l'information. Thèse d'État, Inst. Stat. Univ. Paris, 1953 (in French). Schweppe, 1967, On the Bhattacharyya distance and the divergence between Gaussian processes, Information and Control, 11, 373, 10.1016/S0019-9958(67)90610-9 Schweppe, 1967, State space evaluation of the Bhattacharyya distance between two Gaussian processes, Information and Control, 11, 352, 10.1016/S0019-9958(67)90609-2 Shore, 1981, Properties of cross-entropy minimization, IEEE Transactions on Information Theory, 27, 472, 10.1109/TIT.1981.1056373 Shore, 1982, Minimum cross-entropy pattern classification and cluster analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 11, 10.1109/TPAMI.1982.4767189 Si, 2010, Bregman divergence-based regularization for transfer subspace learning, IEEE Transactions on Knowledge and Data Engineering, 22, 929, 10.1109/TKDE.2009.126 Sibson, 1969, Information radius, Probability Theory and Related Fields, 14, 149 B.K. Sriperumbudur, A. Gretton, K. Fukumizu, G.R.G. Lanckriet, B. Schölkopf, On integral probability metrics, ϕ-divergences and binary classification. ArXiv:0901.2698, January 2009. Srivastava, 2007, Bayesian quadratic discriminant analysis, Journal of Machine Learning Research, 8, 1277 Österreicher, 2003, A new class of metric divergences on probability spaces and its applicability in statistics, Annals of the Institute of Statistical Mathematics, 55, 639, 10.1007/BF02517812 Stoorvogel, 1998, Approximation problems with the divergence criterion for Gaussian variables and Gaussian processes, Systems and Control Letters, 35, 207, 10.1016/S0167-6911(98)00053-X Stummer, 2010, On divergences of finite measures and their applicability in statistics and information theory, Statistics—A Journal of Theoretical and Applied Statistics, 44, 169 Stummer, 2012, On Bregman distances and divergences of probability measures, IEEE Transactions on Information Theory, 58, 1277, 10.1109/TIT.2011.2178139 M. Sugiyama, T. Suzuki, T. Kanamori, Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Annals of the Institute of Statistical Mathematics 64 (2) (2012), 1009–1044 Sung, 2006, Neyman–Pearson detection of Gauss–Markov signals in noise, IEEE Transactions on Information Theory, 52, 1354, 10.1109/TIT.2006.871599 I. Sutskever, T. Tieleman, On the convergence properties of contrastive divergence, in: Y.W. Teh, M. Titterington (Eds.), Proceedings of the 13th International Workshop on Artificial Intelligence and Statistics (AISTATS'10), Chia Laguna, Sardinia, Italy, May 13–15, 2010, pp. 78–795. Taneja, 1989, On generalized information measures and their applications, Advances in Electronics and Electron Physics, 76, 327, 10.1016/S0065-2539(08)60580-6 I.J. Taneja, Generalized Information Measures and Their Applications. 〈www.mtm.ufsc.br/taneja/book/book.html〉, 2001. Taskar, 2006, Structured prediction, dual extragradient and Bregman projections, Journal of Machine Learning Research, 7, 1627 Teboulle, 2007, A unified continuous optimization framework for center-based clustering methods, Journal of Machine Learning Research, 8, 65 Teboulle, 2006, Clustering with entropy-like k-means algorithms, 127 Toma, 2011, Dual divergence estimators and tests, Journal of Multivariate Analysis, 102, 20, 10.1016/j.jmva.2010.07.010 Topsoe, 2000, Some inequalities for information divergence and related measures of discrimination, IEEE Transactions on Information Theory, 46, 1602, 10.1109/18.850703 Torgersen, 1991, vol. 36 Touboul, 2010, Projection pursuit through minimisation ϕ-divergence, Entropy, 12, 1581, 10.3390/e12061581 Tsuda, 2005, Matrix exponentiated gradient updates for on-line learning and Bregman projection, Journal of Machine Learning Research, 6, 995 M. Tsukada, H. Suyari, Tsallis differential entropy and divergences derived from the generalized Shannon–Khinchin axioms, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'09), Seoul, Korea, June 28–July 3, 2009, pp. 149–153. J. Vachery, A. Dukkipati, On Shore and Johnson properties for a special case of Csiszár f-divergences. ArXiv:1201.4285, January 2012. Vajda, 1973, χα-divergence and generalized Fisher's information, 873 Vajda, 1989, vol. 11 I. Vajda, Modifications of Divergence Criteria for Applications in Continuous Families, Research Report 2230, Academy of Sciences of the Czech Republic, Institute of Information Theory and Automation, November 2008. Vajda, 2009, On metric divergences of probability measures, Kybernetika, 45, 885 Vemuri, 2011, Total Bregman divergence and its applications to DTI analysis, IEEE Transactions on Medical Imaging, 30, 475, 10.1109/TMI.2010.2086464 C. Vignat, A.O. Hero, J.A. Costa, A geometric characterization of maximum Rényi entropy distributions, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'06), Seattle, Washington, USA, July 2006, pp. 1822–1826. F. Vrins, D.-T. Pham, M. Verleysen, Is the general form of Renyi's entropy a contrast for source separation?, in: M.E. Davies, C.J. James, S.A. Abdallah, M.D. Plumbley (Eds.), Proceedings of the 7th International Conference on Independent Component Analysis and Blind Source Separation (ICA'07), London, UK, Lecture Notes in Computer Science, , Lecture Notes in Computer Science, vol. 4666, September 9–12, 2007, Springer-Verlag, Berlin, Heidelberg, FRG, 2007, pp. 129–136. Wang, 2009, Divergence estimation for multidimensional densities via k-nearest-neighbor distances, IEEE Transactions on Information Theory, 55, 2392, 10.1109/TIT.2009.2016060 S. Wang, D. Schuurmans, Learning continuous latent variable models with Bregman divergences, in: R. Gavaldà, K.P. Jantke, E. Takimoto (Eds.), Proceedings of the 14th International Conference on Algorithmic Learning Theory (ALT'03), Sapporo, Japan, Lecture Notes in Artificial Intelligence, vol. 2842, Springer-Verlag, Berlin Heidelberg, October 17–19, 2003, pp. 190–204. L. Wu, R. Jin, S.C.-H. Hoi, J. Zhu, N. Yu, Learning Bregman distance functions and its application for semi-supervised clustering, in: Y. Bengio, D. Schuurmans, J. Lafferty, C.K.I. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems 22, Vancouver, British Columbia, Canada, NIPS Foundation, December 7–10, 2009, pp. 2089–2097. Wu, 2009, Model selection in loglinear models using ϕ-divergence measures and MϕE s, Sankhyā, 71, 260 Yeung, 2002 Yeung, 2008 Yin, 2008, Bregman iterative algorithms for ℓ1-minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences, 1, 143, 10.1137/070703983 Yu, 2010, The Kullback–Leibler rate pseudo-metric for comparing dynamical systems, IEEE Transactions on Automatic Control, 55, 1585 R.G. Zaripov, New Measures and Methods in Information Theory. A. N. Tupolev State Technical University Press, Kazan, Tatarstan, 〈www.imm.knc.ru/zaripov-measures.html〉, 2005 (in Russian). Zhang, 2004, Divergence function, duality, and convex analysis, Neural Computation, 16, 159, 10.1162/08997660460734047 Ziv, 1973, On functionals satisfying a data-processing theorem, IEEE Transactions on Information Theory, 19, 275, 10.1109/TIT.1973.1055015