Divergence measures for statistical data processing—An annotated bibliography
Tài liệu tham khảo
J. Aczél, Lectures on Functional Equations and Their Applications, Mathematics in Science and Engineering, vol. 19, Academic Press, 1966.
Aczél, 1984, Measuring information beyond communication theory—Why some generalized information measures may be useful, others not, Aequationes Mathematicae, 27, 1, 10.1007/BF02192655
J. Aczél, Z. Daròczy, On Measures of Information and Their Characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, 1975.
Agarwal, 2010, A geometric view of conjugate priors, Machine Learning, 81, 99, 10.1007/s10994-010-5203-x
Akaike, 1974, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716, 10.1109/TAC.1974.1100705
Ali, 1966, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society—Series B Methodological, 28, 131, 10.1111/j.2517-6161.1966.tb00626.x
Altun, 2006, Unifying divergence minimization and statistical inference via convex duality, vol. 4005, 139
S.-I. Amari, Differential–Geometrical Methods in Statistics, Lecture Notes In Statistics, vol. 28, Springer-Verlag, New York, NY, USA, 1985.
Amari, 2001, Information geometry on hierarchy of probability distributions, IEEE Transactions on Information Theory, 47, 1701, 10.1109/18.930911
Amari, 2007, Integration of stochastic models by minimizing α-divergence, Neural Computation, 19, 2780, 10.1162/neco.2007.19.10.2780
Amari, 2009, α-divergence is unique belonging to both f-divergence and Bregman divergence classes, IEEE Transactions on Information Theory, 55, 4925, 10.1109/TIT.2009.2030485
S.-I. Amari, Information geometry and its applications: convex function and dually flat manifold, in: Emerging Trends in Visual Computing - LIX Colloquium, November 2008, Lecture Notes in Computer Science, vol. 5416, Springer-Verlag, 2009, pp. 75–102.
S.-I. Amari, Information geometry derived from divergence functions, in: 3rd International Symposium on Information Geometry and its Applications, Leipzig, FRG, August 2–6, 2010.
Amari, 2000, vol. 191
Anantharam, 1990, A large deviations approach to error exponents in source coding and hypothesis testing, IEEE Transactions on Information Theory, 36, 938, 10.1109/18.53762
Arikan, 1996, An inequality on guessing and its application to sequential decoding, IEEE Transactions on Information Theory, 42, 99, 10.1109/18.481781
Arimoto, 1971, Information-theoretical considerations on estimation problems, Information and Control, 19, 181, 10.1016/S0019-9958(71)90065-9
S. Arimoto, Information measures and capacity of order α for discrete memoryless channels, in: Topics in Information Theory—2nd Colloquium, Keszthely, HU, 1975, Colloquia Mathematica Societatis János Bolyai, vol. 16, North Holland, Amsterdam, NL, 1977, pp. 41–52.
Arsigny, 2007, Geometric means in a novel vector space structure on symmetric positive-definite matrices, SIAM Journal on Matrix Analysis and Applications, 29, 328, 10.1137/050637996
K.A. Arwini, C.T.J. Dodson, Information Geometry - Near Randomness and Near Independence, Lecture Notes in Mathematics, vol. 1953, Springer, 2008.
J.A. Aslam, V. Pavlu, Query hardness estimation using Jensen–Shannon divergence among multiple scoring functions, in: G. Amati, C. Carpineto, G. Romano (Eds.), Advances in Information Retrieval—29th European Conference on IR Research, ECIR'07, Rome, Italy, Lecture Notes in Computer Science, vol. 4425, Springer-Verlag, Berlin Heidelberg, FRG, April 2–5, 2007, pp. 198–209.
Aviyente, 2004, Characterization of event related potentials using information theoretic distance measures, IEEE Transactions on Biomedical Engineering, 51, 737, 10.1109/TBME.2004.824133
Bahr, 1990, Asymptotic analysis of error probabilities for the nonzero-mean Gaussian hypothesis testing problem, IEEE Transactions on Information Theory, 36, 597, 10.1109/18.54905
A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, An information theoretic analysis of maximum likelihood mixture estimation for exponential families, in: C.E. Brodley (Ed.), Proceedings of the 21st International Conference on Machine Learning (ICML'04), Banff, Alberta, Canada, ACM International Conference Proceeding Series, vol. 69, New York, NY, USA, July 4–8, 2004.
Banerjee, 2007, A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, Journal of Machine Learning Research, 8, 1919
Banerjee, 2005, Clustering with Bregman divergences, Journal of Machine Learning Research, 6, 1705
Barndorff-Nielsen, 1986, The role of differential geometry in statistical theory, International Statistical Review, 54, 83, 10.2307/1403260
Basseville, 1989, Distance measures for signal processing and pattern recognition, Signal Processing, 18, 349, 10.1016/0165-1684(89)90079-0
M. Basseville, Information: entropies, divergences et moyennes. Research Report 1020, IRISA, 〈hal.archives-ouvertes.fr/inria-00490399/〉, May 1996 (in French).
Basseville, 1997, Information criteria for residual generation and fault detection and isolation, Automatica, 33, 783, 10.1016/S0005-1098(97)00004-6
M. Basseville, J.-F. Cardoso, On entropies, divergences, and mean values, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'95), Whistler, British Columbia, Canada, September 1995, p. 330.
Basu, 1998, Robust and efficient estimation by minimising a density power divergence, Biometrika, 85, 549, 10.1093/biomet/85.3.549
Basu, 1994, Minimum disparity estimation for continuous models, Annals of the Institute of Statistical Mathematics, 46, 683, 10.1007/BF00773476
Basu, 2004, The iteratively reweighted estimating equation in minimum distance problems, Computational Statistics and Data Analysis, 45, 105, 10.1016/S0167-9473(02)00326-2
Basu, 2011
Bauschke, 1983, Duality for Bregman projections onto translated cones and affine subspaces, Journal of Approximation Theory, 121, 1
Bekara, 2006, A model selection approach to signal denoising using Kullback's symmetric divergence, Signal Processing, 86, 1400, 10.1016/j.sigpro.2005.03.023
Ben-Tal, 1989, Entropic means, Journal of Mathematical Analysis and Applications, 139, 537, 10.1016/0022-247X(89)90128-5
Bercher, 2008, On some entropy functionals derived from Rényi information divergence, Information Sciences, 178, 2489, 10.1016/j.ins.2008.02.003
Bhattacharyya, 1943, On a measure of divergence between two statistical populations defined by their probability distributions, Bulletin of the Calcutta Mathematical Society, 35, 99
Birgé, 2005, A new lower bound for multiple hypothesis testing, IEEE Transactions on Information Theory, 51, 1611, 10.1109/TIT.2005.844101
Blahut, 1974, Hypothesis testing and information theory, IEEE Transactions on Information Theory, 20, 405, 10.1109/TIT.1974.1055254
Blahut, 1987
J. Boets, K. De Cock, B. De Moor, A mutual information based distance for multivariate Gaussian processes, in: A. Chiuso, A. Ferrante, S. Pinzoni (Eds.), Modeling, Estimation and Control, Festschrift in Honor of Giorgio Picci on the Occasion of his Sixty-Fifth Birthday, Lecture Notes in Control and Information Sciences, vol. 364, Springer-Verlag, Berlin, FRG, October 2007, pp. 15–33.
Bougerol, 1993, Kalman filtering with random coefficients and contraction, SIAM Journal on Control and Optimization, 31, 942, 10.1137/0331041
Bregman, 1967, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7, 200, 10.1016/0041-5553(67)90040-7
Broniatowski, 2006, Minimization of φ-divergences on sets of signed measures, Studia Scientiarum Mathematicarum Hungarica, 43, 403, 10.1556/SScMath.43.2006.4.2
Broniatowski, 2009, Parametric estimation and tests through divergences and the duality technique, Journal of Multivariate Analysis, 100, 16, 10.1016/j.jmva.2008.03.011
Broniatowski, 2012, Divergences and duality for estimation and test under moment condition models, Journal of Statistical Planning and Inference, 142, 2554, 10.1016/j.jspi.2012.03.013
M. Broniatowski, I. Vajda, Several applications of divergence criteria in continuous families. Kybernetika 48, arXiv:0911.0937, in press.
Burbea, 1982, Entropy differential metric, distance and divergence measures in probability spaces, Journal of Multivariate Analysis, 12, 575, 10.1016/0047-259X(82)90065-3
Burbea, 1982, On the convexity of higher order Jensen differences based on entropy functions, IEEE Transactions on Information Theory, 28, 961, 10.1109/TIT.1982.1056573
Burbea, 1982, On the convexity of some divergence measures based on entropy functions, IEEE Transactions on Information Theory, 28, 489, 10.1109/TIT.1982.1056497
Burg, 1982, Estimation of structured covariance matrices, Proceedings of the IEEE, 70, 963, 10.1109/PROC.1982.12427
Byrnes, 2001, A generalized entropy criterion for Nevanlinna–Pick interpolation with degree constraint, IEEE Transactions on Automatic Control, 46, 822, 10.1109/9.928584
M.A. Carreira-Perpiñán, G.E. Hinton, On contrastive divergence learning, in: R. Cowell, Z. Ghahramani (Eds.), Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS'05), Barbados, UK, January 6–8, 2005, pp. 59–66.
L. Cayton, Fast nearest neighbor retrieval for Bregman divergences, in: W.W. Cohen, A. McCallum, S.T. Roweis (Eds.), Proceedings of the 25th International Conference on Machine Learning (ICML'08), Helsinki, Finland, June 2008, pp. 112–119.
L. Cayton, Efficient Bregman range search, in: Y. Bengio, D. Schuurmans, J. Lafferty, C.K.I. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems 22, Vancouver, British Columbia, Canada, NIPS Foundation, December 7–10, 2009, pp. 243–251.
Chernoff, 1952, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics, 23, 493, 10.1214/aoms/1177729330
Cichocki, 2010, Families of alpha- beta- and gamma-divergences, Entropy, 12, 1532, 10.3390/e12061532
A. Cichocki, R. Zdunek, S.-I. Amari, Csiszár's divergences for non-negative matrix factorization: family of new multiplicative algorithm, in: J.P. Rosca, D. Erdogmus, J.C. Príncipe, S. Haykin (Eds.), Proceedings of the 6th International Conference on Independent Component Analysis and Blind Source Separation (ICA'06), Charleston, South Carolina, USA, Lecture Notes in Computer Science, vol. 3889, Springer-Verlag, Berlin Heidelberg, FRG, March 5–8, 2006, pp. 32–39.
Cichocki, 2008, Nonnegative matrix and tensor factorization, IEEE Signal Processing Magazine, 25, 142, 10.1109/MSP.2008.4408452
Cichocki, 2009
Collins, 2002, Logistic regression, AdaBoost and Bregman distances, Machine Learning, 48, 253, 10.1023/A:1013912006537
Coursol, 1979, Sur la formule de Chernoff pour deux processus Gaussiens stationnaires, Comptes Rendus Hebdomadaires des Séances de l'Académie des Sciences, 288, 769
Cover, 1991, 10.1002/0471200611
Cover, 2006
Csiszár, 1963, Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten, Magyar Tudományos Akadémia Matematikai Kutató Intezetenek Kozlemenyei, 8, 85
Csiszár, 1967, Information-type measures of difference of probability distributions and indirect observation, Studia Scientiarum Mathematicarum Hungarica, 2, 299
Csiszár, 1967, On topological properties of f-divergence, Studia Scientiarum Mathematicarum Hungarica, 2, 329
Csiszár, 1975, I-divergence geometry of probability distributions and minimization problems, Annals of Probability, 3, 146, 10.1214/aop/1176996454
I. Csiszár, Information measures: a critical survey, in: J. Kozesnik (Ed.), Transactions of the 7th Conference on Information Theory, Statistical Decision Functions, Random Processes, Prague, vol. B, Academia, Prague, August 18–23, 1974, pp. 73–86.
Csiszár, 1991, Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems, Annals of Statistics, 19, 2032, 10.1214/aos/1176348385
Csiszár, 1995, Generalized cutoff rates and Renyi's information measures, IEEE Transactions on Information Theory, 41, 26, 10.1109/18.370121
Csiszár, 1995, Generalized projections for non-negative functions, Acta Mathematica Hungarica, 68, 161, 10.1007/BF01874442
Csiszár, 2008, Axiomatic characterizations of information measures, Entropy, 10, 261, 10.3390/e10030261
Csiszár, 2003, Information projections revisited, IEEE Transactions on Information Theory, 49, 1474, 10.1109/TIT.2003.810633
I. Csiszár, F. Matus, On minimization of multivariate entropy functionals, in: V. Anantharam, I. Kontoyiannis (Eds.), Proceedings of the IEEE Information Theory Workshop on Networking and Information Theory (ITW'09), Volos, Greece, June 10–12, 2009, pp. 96–100.
I. Csiszár, F. Matus, Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities. ArXiv:1202.0666, February 2012.
M. Das Gupta, T.S. Huang, Bregman distance to l1 regularized logistic regression. ArXiv:1004.3814, April 2010.
S. Della Pietra, V. Della Pietra, J. Lafferty, Duality and Auxiliary Functions for Bregman Distances, Technical Report Collection CMU-CS-01-109R, School of Computer Science, Carnegie Mellon University, February 2002.
Dembo, 1997, Information inequalities and concentration of measure, Annals of Probability, 25, 927, 10.1214/aop/1024404424
Dembo, 1991, Information theoretic inequalities, IEEE Transactions on Information Theory, 37, 1501, 10.1109/18.104312
Dembo, 1998, vol. 38
Devroye, 1996, vol. 31
Dhillon, 2003, A divisive information-theoretic feature clustering algorithm for text classification, Journal of Machine Learning Research, 3, 1265
Dhillon, 2006, Generalized nonnegative matrix approximations with Bregman divergences, 283
Dhillon, 2008, Matrix nearness problems with Bregman divergences, SIAM Journal on Matrix Analysis and Applications, 29, 1120, 10.1137/060649021
Donoho, 2004, When does non-negative matrix factorization give a correct decomposition into parts?
Donsker, 1975, Asymptotic evaluation of certain Markov process expectations for large time, II, Communications on Pure and Applied Mathematics, 28, 279, 10.1002/cpa.3160280206
Dryden, 2009, Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging, Annals of Applied Statistics, 3, 1102, 10.1214/09-AOAS249
Eguchi, 2010, Entropy and divergence associated with power function and the statistical application, Entropy, 12, 262, 10.3390/e12020262
Endres, 2003, A new metric for probability distributions, IEEE Transactions on Information Theory, 49, 1858, 10.1109/TIT.2003.813506
Esteban, 1997, A general class of entropy statistics, Applications of Mathematics, 42, 161, 10.1023/A:1022447020419
Fedotov, 2003, Refinements of Pinsker's inequality, IEEE Transactions on Information Theory, 49, 1491, 10.1109/TIT.2003.811927
Ferrante, 2008, Hellinger versus Kullback–Leibler multivariable spectrum approximation, IEEE Transactions on Automatic Control, 53, 954, 10.1109/TAC.2008.920238
Ferrari, 2010, Maximum Lq-likelihood estimation, Annals of Statistics, 38, 753, 10.1214/09-AOS687
Finesso, 2006, Nonnegative matrix factorization and I-divergence alternating minimization, Linear Algebra and its Applications, 416, 270, 10.1016/j.laa.2005.11.012
Fischer, 2010, Quantization and clustering with Bregman divergences, Journal of Multivariate Analysis, 101, 2207, 10.1016/j.jmva.2010.05.008
Frigyik, 2008, Functional Bregman divergence and Bayesian estimation of distributions, IEEE Transactions on Information Theory, 54, 5130, 10.1109/TIT.2008.929943
Fujimoto, 2007, A modified EM algorithm for mixture models based on Bregman divergence, Annals of the Institute of Statistical Mathematics, 59, 3, 10.1007/s10463-006-0097-x
Fé, 2009, Nonnegative matrix factorization with the Itakura–Saito divergence. With application to music analysis, Neural Computation, 21, 793
Févotte, 2011, Algorithms for nonnegative matrix factorization with the β-divergence, Neural Computation, 23, 2421, 10.1162/NECO_a_00168
Georgiou, 2006, Relative entropy and the multivariable multidimensional moment problem, IEEE Transactions on Information Theory, 52, 1052, 10.1109/TIT.2005.864422
Georgiou, 2007, Distances and Riemannian metrics for spectral density functions, IEEE Transactions on Signal Processing, 55, 3995, 10.1109/TSP.2007.896119
Georgiou, 2009, Metrics for power spectra, IEEE Transactions on Signal Processing, 57, 859, 10.1109/TSP.2008.2010009
Georgiou, 2003, Kullback–Leibler approximation of spectral density functions, IEEE Transactions on Information Theory, 49, 2910, 10.1109/TIT.2003.819324
Georgiou, 2008, A convex optimization approach to ARMA modeling, IEEE Transactions on Automatic Control, 53, 1108, 10.1109/TAC.2008.923684
2010
Gilardoni, 2010, On Pinsker's and Vajda's type inequalities for Csiszár's f-divergences, IEEE Transactions on Information Theory, 56, 5377, 10.1109/TIT.2010.2068710
Gray, 1976, Distance measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 380, 10.1109/TASSP.1976.1162849
R.M. Gray, Entropy and Information Theory, Springer-Verlag, New York, NY, USA, 1990, online corrected version, 2009, 〈http://ee.stanford.edu/gray/it.html〉.
Gray, 2010
Gray, 1980, Distortion measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 367, 10.1109/TASSP.1980.1163421
Grünwald, 2004, Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory, Annals of Statistics, 32, 1367, 10.1214/009053604000000553
Guntuboyina, 2011, Lower bounds for the minimax risk using f-divergences and applications, IEEE Transactions on Information Theory, 57, 2386, 10.1109/TIT.2011.2110791
Györfi, 1978, f-Dissimilarity, Annals of the Institute of Statistical Mathematics, 30, 105, 10.1007/BF02480206
P. Harremoës, I. Vajda, On Bahadur efficiency of power divergence statistics. ArXiv:1002.1493, February 2010.
Harremoës, 2011, On pairs of f-divergences and their joint range, IEEE Transactions on Information Theory, 57, 3230, 10.1109/TIT.2011.2137353
P. Harremoës, C. Vignat, Rényi entropies of projections, in: A. Barg, R.W. Yeung (Eds.), Proceedings of the IEEE International Symposium on Information Theory (ISIT'06), Seattle, WA, USA, July 9–14, 2006, pp. 1827–1830.
Havrda, 1967, Quantification method of classification processes, Kybernetika, 3, 30
He, 2003, A generalized divergence measure for robust image registration, IEEE Transactions on Signal Processing, 51, 1211
A.O. Hero, B. Ma, O. Michel, J. Gorman, Alpha-divergence for Classification, Indexing and Retrieval, Research Report CSPL-328, University of Michigan, Communications and Signal Processing Laboratory, May 2001.
Hinton, 2002, Training products of experts by minimizing contrastive divergence, Neural Computation, 14, 1771, 10.1162/089976602760128018
Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527, 10.1162/neco.2006.18.7.1527
Hoeffding, 1965, Asymptotically optimal tests for multinomial distributions, Annals of Mathematical Statistics, 36, 369, 10.1214/aoms/1177700150
Hyvárinen, 2005, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research, 6, 695
Hyvárinen, 2007, Some extensions of score matching, Computational Statistics and Data Analysis, 51, 2499, 10.1016/j.csda.2006.09.003
James, 1961, Estimation with quadratic loss, vol. 1, 361
Jiang, 2012, Geometric methods for spectral analysis, IEEE Transactions on Signal Processing, 60, 1064, 10.1109/TSP.2011.2178601
Jiang, 2012, Distances and Riemannian metrics for multivariate spectral densities, IEEE Transactions on Automatic Control, 57, 1723, 10.1109/TAC.2012.2183171
Johnson, 2004, Fisher information inequalities and the central limit theorem, Probability Theory and Related Fields, 129, 391, 10.1007/s00440-004-0344-0
Johnson, 1979, Axiomatic characterization of the directed divergences and their linear combinations, IEEE Transactions on Information Theory, 25, 709, 10.1109/TIT.1979.1056113
Jones, 1990, General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis, IEEE Transactions on Information Theory, 36, 23, 10.1109/18.50370
Jones, 2001, A comparison of related density-based minimum divergence estimators, Biometrika, 88, 865, 10.1093/biomet/88.3.865
Kagan, 2008, Some inequalities related to the Stam inequality, Applications of Mathematics, 53, 195, 10.1007/s10492-008-0004-2
T. Kanamori, A. Ohara, A Bregman extension of quasi-Newton updates II: convergence and robustness properties. ArXiv:1010.2846, October 2010.
T. Kanamori, A. Ohara, A Bregman extension of quasi-Newton updates I: an information geometrical framework, Optimization Methods and Software 27, doi:10.1080/10556788.2011.613073, in press.
Kanamori, 2012, f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models, IEEE Transactions on Information Theory, 58, 708, 10.1109/TIT.2011.2163380
Karagrigoriou, 2010, Measures of divergence in model selection, 51
Karagrigoriou, 2008, On measures of information and divergence and model selection criteria, 503
Karlsson, 2010, The inverse problem of analytic interpolation with degree constraint and weight selection for control synthesis, IEEE Transactions on Automatic Control, 55, 405, 10.1109/TAC.2009.2037280
Kass, 1997
Kazakos, 1980, On resolution and exponential discrimination between Gaussian stationary vector processes and dynamic models, IEEE Transactions on Automatic Control, 25, 294, 10.1109/TAC.1980.1102275
Kazakos, 1982, Spectral distance measures between continuous-time vector Gaussian processes, IEEE Transactions on Information Theory, 28, 679, 10.1109/TIT.1982.1056521
Kazakos, 1980, Spectral distance measures between Gaussian processes, IEEE Transactions on Automatic Control, 25, 950, 10.1109/TAC.1980.1102475
Kazakos, 1990
Kim, 2008, Estimation of a tail index based on minimum density power divergence, Journal of Multivariate Analysis, 99, 2453, 10.1016/j.jmva.2008.02.031
J. Kivinen, M.K. Warmuth, Boosting as entropy projection, in: Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT'99), Santa Cruz, CA, USA, ACM, July 7–9, 1999, pp. 134–144.
Kivinen, 2006, The p-norm generalization of the LMS algorithm for adaptive filtering, IEEE Transactions on Signal Processing, 54, 1782, 10.1109/TSP.2006.872551
Knockaert, 1993, A class of statistical and spectral distance measures based on Bose–Einstein statistics, IEEE Transactions on Signal Processing, 41, 3171, 10.1109/78.257248
L. Knockaert, Statistical thermodynamics and natural f-divergences. unpublished paper 〈users.ugent.be/lknockae/〉, 1994.
Knockaert, 2003, On scale and concentration invariance in entropies, Information Sciences, 152, 139, 10.1016/S0020-0255(03)00058-6
Kompass, 2007, A generalized divergence measure for nonnegative matrix factorization, Neural Computation, 19, 780, 10.1162/neco.2007.19.3.780
Kulis, 2009, Low-rank kernel learning with Bregman matrix divergences, Journal of Machine Learning Research, 10, 341
S. Kullback, J.C. Keegel, J.H. Kullback, Topics in Statistical Information Theory, Lecture Notes in Statistics, vol. 42, Springer-Verlag, New York, NY, USA, 1987.
J.D. Lafferty, Statistical learning algorithms based on Bregman distances, in: Proceedings of the Canadian Workshop on Information Theory, Toronto, Canada, June 3–6, 1997, pp. 77–80.
J.D. Lafferty, Additive models, boosting, and inference for generalized divergences, in: Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT'99), Santa Cruz, CA, USA, ACM, July 7–9, 1999, pp. 125–133.
Lawson, 2007, A Birkhoff contraction formula with application to Riccati equations, SIAM Journal on Control and Optimization, 46, 930, 10.1137/050637637
Le Besnerais, 1999, A new look at entropy for solving linear inverse problems, IEEE Transactions on Information Theory, 45, 1565, 10.1109/18.771159
G. Lebanon, J. Lafferty, Boosting and maximum likelihood for exponential models, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14, Vancouver, British Columbia, Canada, MIT Press, Cambridge, MA, December 3–8, 2001.
Lee, 2008, Invariant metrics, contractions and nonlinear matrix equations, Nonlinearity, 21, 857, 10.1088/0951-7715/21/4/011
A. Lefevre, F. Bach, C. Fevotte, Online algorithms for nonnegative matrix factorization with the Itakura–Saito divergence, in: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA'11), New Paltz, NY, USA, October 16–19, 2011, pp. 313–316.
Leonenko, 2010, Statistical inference for the ϵ-entropy and the quadratic Rényi entropy, Journal of Multivariate Analysis, 101, 1981, 10.1016/j.jmva.2010.05.009
Levy, 2004, Robust least-squares estimation with a relative entropy constraint, IEEE Transactions on Information Theory, 50, 89, 10.1109/TIT.2003.821992
Li, 2009, Effective metric for detecting distributed denial-of-service attacks based on information divergence, IET Communications, 3, 1851, 10.1049/iet-com.2008.0586
F. Liese, I. Vajda, Convex Statistical Distances, Texte zur Mathematick, vol. 95, Teubner, Leipzig, 1987.
Liese, 2006, On divergences and informations in statistics and information theory, IEEE Transactions on Information Theory, 52, 4394, 10.1109/TIT.2006.881731
Lin, 1991, Divergence measures based on the Shannon entropy, IEEE Transactions on Information Theory, 37, 145, 10.1109/18.61115
Lindsay, 1994, Efficiency versus robustness, Annals of Statistics, 22, 1081, 10.1214/aos/1176325512
Lutwak, 2005, Cramér–Rao and moment-entropy inequalities for Rényi entropy and generalized Fisher information, IEEE Transactions on Information Theory, 51, 473, 10.1109/TIT.2004.840871
Ma, 2011, Fixed point and Bregman iterative methods for matrix rank minimization, Mathematical Programming, Series A, 128, 321, 10.1007/s10107-009-0306-5
MacKay, 2003
Maji, 2009, f-Information measures for efficient selection of discriminative genes from microarray data, IEEE Transactions on Biomedical Engineering, 56, 1063, 10.1109/TBME.2008.2004502
Maji, 2010, Feature selection using f-information measures in fuzzy approximation spaces, IEEE Transactions on Knowledge and Data Engineering, 22, 854, 10.1109/TKDE.2009.124
Mantalos, 2010, An improved divergence information criterion for the determination of the order of an AR process, Communications in Statistics—Simulation and Computation, 39, 865, 10.1080/03610911003650391
Markatou, 1998, Weighted likelihood equations with bootstrap root search, Journal of the American Statistical Association, 93, 740, 10.1080/01621459.1998.10473726
Martín, 2011, A new class of minimum power divergence estimators with applications to cancer surveillance, Journal of Multivariate Analysis, 102, 1175, 10.1016/j.jmva.2011.03.011
Mathai, 1975
Y. Matsuyama, Non-logarithmic information measures, α-weighted EM algorithms and speedup of learning, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'98), Cambridge, MA, USA, August 16–21, 1998, p. 385.
Y. Matsuyama, The α-EM algorithm and its applications, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'00), vol. 1, Istanbul, Turkey, June 5–9, 2000, pp. 592–595.
Matsuyama, 2003, The α-EM algorithm, IEEE Transactions on Information Theory, 49, 692, 10.1109/TIT.2002.808105
Y. Matsuyama, N. Katsumata, S. Imahara, Convex divergence as a surrogate function for independence: the f-divergence, in: T.-W, Lee, T.-P. Jung, S. Makeig, T.J. Sejnowski, (Eds.), Proceedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation, San Diego, CA, USA, December 2001, pp. 31–36.
Mattheou, 2009, A model selection criterion based on the BHHJ measure of divergence, Journal of Statistical Planning and Inference, 139, 228, 10.1016/j.jspi.2008.04.022
Matus, 2009, Divergence from factorizable distributions and matroid representations by partitions, IEEE Transactions on Information Theory, 55, 5375, 10.1109/TIT.2009.2032806
Matusita, 1973, Discrimination and the affinity of distributions, 213
Merhav, 2011, Data processing theorems and the second law of thermodynamics, IEEE Transactions on Information Theory, 57, 4926, 10.1109/TIT.2011.2159052
Minami, 2002, Robust blind source separation by beta divergence, Neural Computation, 14, 1859
T. Minka, Divergence Measures and Message Passing, Technical Report MSR-TR-2005-173, Microsoft Research Ltd, 2005.
A. Mnih, G. Hinton, Learning nonlinear constraints with contrastive backpropagation, in: D.V. Prokhorov (Ed.), Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'05), vol. 2, Montréal, Québec, Canada, July 31–August 4, 2005, pp. 1302–1307.
Moakher, 2006, Symmetric positive-definite matrices, vol. 17, 285
Mollah, 2006, Exploring latent structure of mixture ICA models by the minimum β-divergence method, Neural Computation, 18, 166, 10.1162/089976606774841549
Morimoto, 1963, Markov processes and the H-theorem, Journal of the Physical Society of Japan, 18, 328, 10.1143/JPSJ.18.328
Murata, 2004, Information geometry of U-Boost and Bregman divergence, Neural Computation, 16, 1437, 10.1162/089976604323057452
Nascimento, 2010, Hypothesis testing in speckled data with stochastic distances, IEEE Transactions on Geoscience and Remote Sensing, 48, 373, 10.1109/TGRS.2009.2025498
Nason, 2001, Robust projection indices, Journal of the Royal Statistical Society—Series B Methodological, 63, 551, 10.1111/1467-9868.00298
Natarajan, 1985, Large deviations, hypotheses testing, and source coding for finite Markov chains, IEEE Transactions on Information Theory, 31, 360, 10.1109/TIT.1985.1057036
Nath, 1975, On a coding theorem connected with Rényi's entropy, Information and Control, 29, 234, 10.1016/S0019-9958(75)90404-0
Nguyen, 2009, On surrogate loss functions and f-divergences, Annals of Statistics, 37, 876, 10.1214/08-AOS595
Nguyen, 2010, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory, 56, 5847, 10.1109/TIT.2010.2068870
Nielsen, 2011, The Burbea–Rao and Bhattacharyya centroids, IEEE Transactions on Information Theory, 57, 5455, 10.1109/TIT.2011.2159046
Nielsen, 2009, Sided and symmetrized Bregman centroids, IEEE Transactions on Information Theory, 55, 2882, 10.1109/TIT.2009.2018176
F. Nielsen, P. Piro, M. Barlaud, Bregman vantage point trees for efficient nearest neighbor queries, in: Q. Sun, Y. Rui (Eds.), Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'09), New York, NY, USA, June 28–July 3, 2009, pp. 878–881.
Nishimura, 2008, The information geometric structure of generalized empirical likelihood estimators, Communications in Statistics—Theory and Methods, 37, 1867, 10.1080/03610920801893657
Nock, 2009, Bregman divergences and surrogates for learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2048, 10.1109/TPAMI.2008.225
Pardo, 2006
Pardo, 1995, Divergence measures based on entropy functions and statistical inference, Sankhyā, 57, 315
Pardo, 2003, On asymptotic properties of information-theoretic divergences, IEEE Transactions on Information Theory, 49, 1860, 10.1109/TIT.2003.813509
Patra, 2008, Minimum Hellinger distance estimation with inlier modification, Sankhyā, 70, 310
Pavon, 2006, On the Georgiou–Lindquist approach to constrained Kullback–Leibler approximation of spectral densities, IEEE Transactions on Automatic Control, 51, 639, 10.1109/TAC.2006.872755
M. Pavon, A. Ferrante, On the geometry of maximum entropy problems. ArXiv:1112.5529, December 2011.
Pelletier, 2005, Informative barycentres in statistics, Annals of the Institute of Statistical Mathematics, 57, 767, 10.1007/BF02915437
Pelletier, 2011, Inference in ϕ-families of distributions, Statistics—A Journal of Theoretical and Applied Statistics, 45, 223
Perez, 1984, Barycenter of a set of probability measures and its application in statistical decision, 154
Petz, 1996, Monotone metrics on matrix spaces, Linear Algebra and its Applications, 244, 81, 10.1016/0024-3795(94)00211-8
Petz, 2005, Means of positive numbers and matrices, SIAM Journal on Matrix Analysis and Applications, 27, 712, 10.1137/050621906
Pham, 2008, On the risk of using Rényi's entropy for blind source separation, IEEE Transactions on Signal Processing, 56, 4611, 10.1109/TSP.2008.928109
Pluim, 2004, f-Information measures in medical image registration, IEEE Transactions on Medical Imaging, 23, 1508, 10.1109/TMI.2004.836872
B. Poczos, L. Xiong, J. Schneider, Nonparametric divergence estimation with applications to machine learning on distributions. ArXiv:1202.3758, February 2012.
Principe, 2008
Qiao, 2010, A study on invariance of f-divergence and its application to speech recognition, IEEE Transactions on Signal Processing, 58, 3884, 10.1109/TSP.2010.2047340
Ramponi, 2009, A globally convergent matricial algorithm for multivariate spectral estimation, IEEE Transactions on Automatic Control, 54, 2376, 10.1109/TAC.2009.2028977
Rao, 1945, Information and accuracy attainable in the estimation of statistical parameters, Bulletin of the Calcutta Mathematical Society, 37, 81
Rao, 1982, Diversity and dissimilarity coefficients, Theoretical Population Biology, 21, 24, 10.1016/0040-5809(82)90004-1
Rao, 1982, Diversity, Sankhyā, 44, 1
Rao, 1986, Rao's axiomatization of diversity measures, vol. 7, 614
Rao, 1987, Differential metrics in probability spaces, vol. 10, 217
Rao, 1985, Cross entropy, dissimilarity measures, and characterizations of quadratic entropy, IEEE Transactions on Information Theory, 31, 589, 10.1109/TIT.1985.1057082
Rauh, 2011, Finding the maximizers of the information divergence from an exponential family, IEEE Transactions on Information Theory, 57, 3236, 10.1109/TIT.2011.2136230
Ravikumar, 2010, Message-passing for graph-structured linear programs, Journal of Machine Learning Research, 11, 1043
Read, 1988
Reid, 2010, Composite binary losses, Journal of Machine Learning Research, 11, 2387
Reid, 2011, Information, divergence and risk for binary experiments, Journal of Machine Learning Research, 12, 731
Rényi, 1961, On measures of information and entropy, vol. 1, 547
Rényi, 1967, On some basic problems of statistics from the point of view of information theory, vol. 1, 531
A. Roman, S. Jolad, M.C. Shastry, Bounded divergence measures based on Bhattacharyya coefficient. ArXiv:1201.0418, January 2012.
Sander, 2002, Measures of information, vol. 2, 1523
R. Santos-Rodriguez, D. Garcia-Garcia, J. Cid-Sueiro, Cost-sensitive classification based on Bregman divergences for medical diagnosis, in: M.A. Wani (Ed.), Proceedings of the 8th International Conference on Machine Learning and Applications (ICMLA'09), Miami Beach, FL, USA, December 13–15, 2009, pp. 551–556.
M.P. Schützenberger, Contribution aux applications statistiques de la théorie de l'information. Thèse d'État, Inst. Stat. Univ. Paris, 1953 (in French).
Schweppe, 1967, On the Bhattacharyya distance and the divergence between Gaussian processes, Information and Control, 11, 373, 10.1016/S0019-9958(67)90610-9
Schweppe, 1967, State space evaluation of the Bhattacharyya distance between two Gaussian processes, Information and Control, 11, 352, 10.1016/S0019-9958(67)90609-2
Shore, 1981, Properties of cross-entropy minimization, IEEE Transactions on Information Theory, 27, 472, 10.1109/TIT.1981.1056373
Shore, 1982, Minimum cross-entropy pattern classification and cluster analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 11, 10.1109/TPAMI.1982.4767189
Si, 2010, Bregman divergence-based regularization for transfer subspace learning, IEEE Transactions on Knowledge and Data Engineering, 22, 929, 10.1109/TKDE.2009.126
Sibson, 1969, Information radius, Probability Theory and Related Fields, 14, 149
B.K. Sriperumbudur, A. Gretton, K. Fukumizu, G.R.G. Lanckriet, B. Schölkopf, On integral probability metrics, ϕ-divergences and binary classification. ArXiv:0901.2698, January 2009.
Srivastava, 2007, Bayesian quadratic discriminant analysis, Journal of Machine Learning Research, 8, 1277
Österreicher, 2003, A new class of metric divergences on probability spaces and its applicability in statistics, Annals of the Institute of Statistical Mathematics, 55, 639, 10.1007/BF02517812
Stoorvogel, 1998, Approximation problems with the divergence criterion for Gaussian variables and Gaussian processes, Systems and Control Letters, 35, 207, 10.1016/S0167-6911(98)00053-X
Stummer, 2010, On divergences of finite measures and their applicability in statistics and information theory, Statistics—A Journal of Theoretical and Applied Statistics, 44, 169
Stummer, 2012, On Bregman distances and divergences of probability measures, IEEE Transactions on Information Theory, 58, 1277, 10.1109/TIT.2011.2178139
M. Sugiyama, T. Suzuki, T. Kanamori, Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Annals of the Institute of Statistical Mathematics 64 (2) (2012), 1009–1044
Sung, 2006, Neyman–Pearson detection of Gauss–Markov signals in noise, IEEE Transactions on Information Theory, 52, 1354, 10.1109/TIT.2006.871599
I. Sutskever, T. Tieleman, On the convergence properties of contrastive divergence, in: Y.W. Teh, M. Titterington (Eds.), Proceedings of the 13th International Workshop on Artificial Intelligence and Statistics (AISTATS'10), Chia Laguna, Sardinia, Italy, May 13–15, 2010, pp. 78–795.
Taneja, 1989, On generalized information measures and their applications, Advances in Electronics and Electron Physics, 76, 327, 10.1016/S0065-2539(08)60580-6
I.J. Taneja, Generalized Information Measures and Their Applications. 〈www.mtm.ufsc.br/taneja/book/book.html〉, 2001.
Taskar, 2006, Structured prediction, dual extragradient and Bregman projections, Journal of Machine Learning Research, 7, 1627
Teboulle, 2007, A unified continuous optimization framework for center-based clustering methods, Journal of Machine Learning Research, 8, 65
Teboulle, 2006, Clustering with entropy-like k-means algorithms, 127
Toma, 2011, Dual divergence estimators and tests, Journal of Multivariate Analysis, 102, 20, 10.1016/j.jmva.2010.07.010
Topsoe, 2000, Some inequalities for information divergence and related measures of discrimination, IEEE Transactions on Information Theory, 46, 1602, 10.1109/18.850703
Torgersen, 1991, vol. 36
Touboul, 2010, Projection pursuit through minimisation ϕ-divergence, Entropy, 12, 1581, 10.3390/e12061581
Tsuda, 2005, Matrix exponentiated gradient updates for on-line learning and Bregman projection, Journal of Machine Learning Research, 6, 995
M. Tsukada, H. Suyari, Tsallis differential entropy and divergences derived from the generalized Shannon–Khinchin axioms, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'09), Seoul, Korea, June 28–July 3, 2009, pp. 149–153.
J. Vachery, A. Dukkipati, On Shore and Johnson properties for a special case of Csiszár f-divergences. ArXiv:1201.4285, January 2012.
Vajda, 1973, χα-divergence and generalized Fisher's information, 873
Vajda, 1989, vol. 11
I. Vajda, Modifications of Divergence Criteria for Applications in Continuous Families, Research Report 2230, Academy of Sciences of the Czech Republic, Institute of Information Theory and Automation, November 2008.
Vajda, 2009, On metric divergences of probability measures, Kybernetika, 45, 885
Vemuri, 2011, Total Bregman divergence and its applications to DTI analysis, IEEE Transactions on Medical Imaging, 30, 475, 10.1109/TMI.2010.2086464
C. Vignat, A.O. Hero, J.A. Costa, A geometric characterization of maximum Rényi entropy distributions, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT'06), Seattle, Washington, USA, July 2006, pp. 1822–1826.
F. Vrins, D.-T. Pham, M. Verleysen, Is the general form of Renyi's entropy a contrast for source separation?, in: M.E. Davies, C.J. James, S.A. Abdallah, M.D. Plumbley (Eds.), Proceedings of the 7th International Conference on Independent Component Analysis and Blind Source Separation (ICA'07), London, UK, Lecture Notes in Computer Science, , Lecture Notes in Computer Science, vol. 4666, September 9–12, 2007, Springer-Verlag, Berlin, Heidelberg, FRG, 2007, pp. 129–136.
Wang, 2009, Divergence estimation for multidimensional densities via k-nearest-neighbor distances, IEEE Transactions on Information Theory, 55, 2392, 10.1109/TIT.2009.2016060
S. Wang, D. Schuurmans, Learning continuous latent variable models with Bregman divergences, in: R. Gavaldà, K.P. Jantke, E. Takimoto (Eds.), Proceedings of the 14th International Conference on Algorithmic Learning Theory (ALT'03), Sapporo, Japan, Lecture Notes in Artificial Intelligence, vol. 2842, Springer-Verlag, Berlin Heidelberg, October 17–19, 2003, pp. 190–204.
L. Wu, R. Jin, S.C.-H. Hoi, J. Zhu, N. Yu, Learning Bregman distance functions and its application for semi-supervised clustering, in: Y. Bengio, D. Schuurmans, J. Lafferty, C.K.I. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems 22, Vancouver, British Columbia, Canada, NIPS Foundation, December 7–10, 2009, pp. 2089–2097.
Wu, 2009, Model selection in loglinear models using ϕ-divergence measures and MϕE s, Sankhyā, 71, 260
Yeung, 2002
Yeung, 2008
Yin, 2008, Bregman iterative algorithms for ℓ1-minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences, 1, 143, 10.1137/070703983
Yu, 2010, The Kullback–Leibler rate pseudo-metric for comparing dynamical systems, IEEE Transactions on Automatic Control, 55, 1585
R.G. Zaripov, New Measures and Methods in Information Theory. A. N. Tupolev State Technical University Press, Kazan, Tatarstan, 〈www.imm.knc.ru/zaripov-measures.html〉, 2005 (in Russian).
Zhang, 2004, Divergence function, duality, and convex analysis, Neural Computation, 16, 159, 10.1162/08997660460734047
Ziv, 1973, On functionals satisfying a data-processing theorem, IEEE Transactions on Information Theory, 19, 275, 10.1109/TIT.1973.1055015