Bayesian Measures of Model Complexity and Fit

David J. Spiegelhalter1, Nicola Best2, Bradley P. Carlin3, Angelika van der Linde4
1Medical Research Council Biostatistics Unit, Cambridge, UK
2Imperial College School of Medicine, London, UK
3University of Minnesota, Minneapolis, USA
4University of Bremen, Germany

Tóm tắt

SummaryWe consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

Từ khóa


Tài liệu tham khảo

Akaike, 1973, Proc. 2nd Int. Symp. Information Theory, 267

Andrews, 1974, Scale mixtures of normal distributions, J. R. Statist. Soc., 36, 99

Berk, 1966, Limiting behaviour of posterior distributions when the model is incorrect, Ann. Math. Statist., 37, 51, 10.1214/aoms/1177699597

Bernardo, 1979, Expected information as expected utility, Ann. Statist., 7, 686, 10.1214/aos/1176344689

Bernardo, 1994, Bayesian Theory, 10.1002/9780470316870

Besag, 1974, Spatial interaction and the statistical analysis of lattice systems (with discussion), J. R. Statist. Soc., 36, 192

Biller, 2001, Bayesian varying-coefficient modelsusing adaptive regression splines, Statist. Modlng, 1, 195, 10.1177/1471082X0100100303

Box, 1976, Science and statistics, J. Am. Statist. Ass., 71, 791, 10.1080/01621459.1976.10480949

Breslow, 1993, Approximate inference in generalized linear mixed models, J. Am. Statist. Ass., 88, 9

Brownlee, 1965, Statistical Theory and Methodology in Science and Engineering

Bunke, 1998, Asymptotic behaviour of Bayes estimates under possibly incorrect models, Ann. Statist., 26, 617, 10.1214/aos/1028144851

Burnham, 1998, Model Selection and Inference, 10.1007/978-1-4757-2917-7

Carlin, 2000, Bayes and Empirical Bayes Methods for Data Analysis, 10.1201/9781420057669

Chib, 1998, Analysis of multivariate probit models, Biometrika, 85, 347, 10.1093/biomet/85.2.347

Clayton, 1987, Empirical Bayes estimates of age-standardised relative risks for use in disease mapping, Biometrics, 43, 671, 10.2307/2532003

Dempster, 1974, Proc. Conf. Foundational Questions in Statistical Inference, 335

1997, The direct use of likelihood for significance testing, Statist. Comput., 7, 247, 10.1023/A:1018598421607

1997, Commentary on the paper by Murray Aitkin, and on discussion by Mervyn Stone, Statist. Comput., 7, 265, 10.1023/A:1018554606586

Efron, 1986, How biased is the apparent error rate of a prediction rule, J. Am. Statist. Ass., 81, 461, 10.1080/01621459.1986.10478291

Erkanli, 2001, Bayesian analyses of longitudinal binary data using markov regression models of unknown order, Statist. Med., 20, 755, 10.1002/sim.702

Erkanli, 1999, Bayesian inference for prevalence in longitudinal two-phase studies, Biometrics, 55, 1145, 10.1111/j.0006-341X.1999.01145.x

Eubank, 1985, Diagnostics for smoothing splines, J. R. Statist. Soc., 47, 332

Eubank, 1986, Diagnostics for penalized least-squares estimators, Statist. Probab. Lett., 4, 265, 10.1016/0167-7152(86)90101-X

Fitzmaurice, 1993, A likelihood-based method for analysing longitudinal binary responses, Biometrika, 80, 141, 10.1093/biomet/80.1.141

Gelfand, 1994, Bayesian model choice: asymptotics and exact calculations, J. R. Statist. Soc., 56, 501

Gelfand, 2000, Conditional categorical response models with application to treatment of acute myocardial infarction, Appl. Statist., 49, 171

Gelfand, 1998, Model choice: a minimum posterior predictive loss approach, Biometrika, 85, 1, 10.1093/biomet/85.1.1

Gelfand, 2002, Technical Report

Gilks, 1996, Markov Chain Monte Carlo in Practice

Gilks, 1993, Random-effects models for longitudinal data using Gibbs sampling, Biometrics, 49, 441, 10.2307/2532557

Good, 1956, The surprise index for the multivariate normal distribution, Ann. Math. Statist., 27, 1130, 10.1214/aoms/1177728079

Green, 2002, J. Am. Statist. Ass.

Han, 2001, MCMC methods for computing Bayes factors: a comparative review, J. Am. Statist. Ass., 96, 1122, 10.1198/016214501753208780

Hastie, 1990, Generalized Additive Models

Hodges, 2001, Counting degrees of freedom in hierarchical and other richly-parameterised models, Biometrika, 88, 367, 10.1093/biomet/88.2.367

Huber, 1967, Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, 221

Kass, 1995, Bayes factors and model uncertainty, J. Am. Statist. Ass., 90, 773, 10.1080/01621459.1995.10476572

Key, 1999, Bayesian Statistics 6, 343, 10.1093/oso/9780198504856.003.0015

Kimeldorf, 1970, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann. Math. Statist., 41, 495, 10.1214/aoms/1177697089

Kullback, 1951, On information and sufficiency, Ann. Math. Statist., 22, 79, 10.1214/aoms/1177729694

Laird, 1982, Random effects models for longitudinal data, Biometrics, 38, 963, 10.2307/2529876

Laud, 1995, Predictive model selection, J. R. Statist. Soc., 57, 247

Lee, 1996, Hierarchical generalized linear models (with discussion), J. R. Statist. Soc., 58, 619

Linde, 1995, Splines from a Bayesian point of view, Test, 4, 63, 10.1007/BF02563103

2000, Reference priors for shrinkage and smoothing parameters, J. Statist. Planng Inf., 90, 245, 10.1016/S0378-3758(00)00116-6

Lindley, 1972, Bayes estimates for the linear model (with discussion), J. R. Statist. Soc., 34, 1

MacKay, 1992, Bayesian interpolation, Neur. Computn, 4, 415, 10.1162/neco.1992.4.3.415

1995, Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks, Netwrk Computn Neur. Syst., 6, 469, 10.1088/0954-898X_6_3_011

McCullagh, 1989, Generalized Linear Models, 10.1007/978-1-4899-3242-6

Meng, 1992, Performing likelihood ratio tests with multiply imputed data sets, Biometrika, 79, 103, 10.1093/biomet/79.1.103

Moody, 1992, Advances in Neural Information Processing Systems 4, 847

Murata, 1994, Network information criterion—determining the number of hidden units for artificial neural network models, IEEE Trans. Neur. Netwrks, 5, 865, 10.1109/72.329683

Natarajan, 2000, Reference Bayesian methods for generalised linear mixed models, J. Am. Statist. Ass., 95, 227, 10.1080/01621459.2000.10473916

Raghunathan, 1988, Technical Report

Rahman, 1999, The Bayesian analysis of a pivotal pharmacokinetic study, Statist. Meth. Med. Res., 8, 195, 10.1177/096228029900800303

Richardson, 1997, On Bayesian analysis of mixtures with an unknown number of components (with discussion), J. R. Statist. Soc., 59, 731, 10.1111/1467-9868.00095

Ripley, 1996, Pattern Recognition and Neural Networks, 10.1017/CBO9780511812651

Sawa, 1978, Information criteria for choice of regression models: a comment, Econometrica, 46, 1273, 10.2307/1913828

Schwarz, 1978, Estimating the dimension of a model, Ann. Statist., 6, 461, 10.1214/aos/1176344136

Slate, 1994, Parameterizations for natural exponential-families with quadratic variance functions, J. Am. Statist. Ass., 89, 1471, 10.1080/01621459.1994.10476886

Spiegelhalter, 2000, WinBUGS Version 1.3 User Manual

Spiegelhalter, 1996, BUGS Examples Volume 1, Version 0.5 (Version ii)

Stone, 1977, An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion, J. R. Statist. Soc., 39, 44

Takeuchi, 1976, Distribution of informational statistics and a criterion for model fitting (in Japanese), Suri-Kagaku, 153, 12

Vehtari, 1999, IJCNN’99: Proc. 1999. Int. Joint Conf. Neural Networks

Wahba, 1978, Improper priors, spline smoothing and the problem of guarding against model errors in regressions, J. R. Statist. Soc., 40, 364

1983, Bayesian ‘‘confidence intervals’’ for the cross-validated smoothing spline, J. R. Statist. Soc., 45, 133

1990, Spline Models for Observational Data

Ye, 1998, On measuring and correcting the effects of data mining and model selection, J. Am. Statist. Ass., 93, 120, 10.1080/01621459.1998.10474094

Ye, 1998, Technical Report

Zeger, 1991, Generalised linear models with random effects; a Gibbs sampling approach, J. Am. Statist. Ass., 86, 79, 10.1080/01621459.1991.10475006

Zhu, 2000, Comparing hierarchical models for spatio-temporally misaligned data using the deviance information criterion, Statist. Med., 19, 2265, 10.1002/1097-0258(20000915/30)19:17/18<2265::AID-SIM568>3.0.CO;2-6

Aitkin, 1991, Posterior Bayes factors (with discussion), J. R. Statist. Soc., 53, 111

Akaike, 1973, Proc. 2nd Int. Symp. Information Theory, 267

Atkinson, 1980, A note on the generalized information criterion for choice of a model, Biometrika, 67, 413, 10.1093/biomet/67.2.413

Atkinson, 2000, Robust Diagnostic Regression Analysis, 10.1007/978-1-4612-1160-0

2002, Technical Report LSERR73

Bernardo, Expected information as expected utility, Ann. Statist., 7, 686

Bernardo, 1999, Bayesian Statistics 6, 101, 10.1093/oso/9780198504856.003.0005

Bernardo, 1994, Bayesian Theory, 10.1002/9780470316870

Bernardo, 2002, 7th Valencia Int. Meet. Bayesian Statistics, Tenerife, June

Burnham, 1998, Model Selection and Inference: a Practical Information-theoretic Approach, 10.1007/978-1-4757-2917-7

2002, Model Selection and Multimodel Inference: a Practical Information-theoretical Approach

Casella, 2000, Mixture models, latent variables and partitioned importance sampling

Celeux, 2000, Computational and inferential difficulties with mixtures posterior distribution, J. Am. Statist. Ass., 95, 957, 10.1080/01621459.2000.10474285

Cooke, 1991, Experts in Uncertainty, 10.1093/oso/9780195064650.001.0001

Cowell, 1999, Probabilistic Networks and Expert Systems

Daniels, 1999, Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models, J. Am. Statist. Ass., 94, 1254, 10.1080/01621459.1999.10473878

2001, Shrinkage estimators for covariance matrices, Biometrics, 57, 1173, 10.1111/j.0006-341X.2001.01173.x

Dawid, 1984, Statistical theory: the prequential approach, J. R. Statist. Soc., 147, 278

Kotz, 1986, Probability forecasting, Encyclopedia of Statistical Sciences, 210

1991, Fisherian inference in likelihood and prequential frames of reference (with discussion), J. R. Statist. Soc., 53, 79

Bernardo, 1992, Bayesian Statistics 4, 109, 10.1093/oso/9780198522669.001.0001

Ghosh, 1992, Current Issues in Statistical Inference: Essays in Honor of D. Basu, 113

Draper, 1999, Bayesian Statistics 6, 541

Draper, 2000, A case study of stochastic optimization in health policy: problem formulation and preliminary results, J. Global Optimzn, 18, 399, 10.1023/A:1026504402220

Dupuis, 2002, Model choice in qualitative regression models, J. Statist. Planng Inf.

Efron, 1986, How biased is the apparent error rate of a prediction rule?, J. Am. Statist. Ass., 81, 461, 10.1080/01621459.1986.10478291

Fouskakis, 2002, Stochastic optimization: a review, Int. Statist. Rev., 10.1111/j.1751-5823.2002.tb00174.x

Gangnon, 2002, Spatial Cluster Modelling

Gelfand, 1996, Markov Chain Monte Carlo in Practice, 145

Gelfand, 1992, Bayesian Statistics 4, 147, 10.1093/oso/9780198522669.003.0009

Gelman, 1996, Posterior predictive assessment of model fitness via realized discrepancies (with discussion), Statist. Sin., 6, 733

Good, 1952, Rational decisions, J. R. Statist. Soc., 14, 107

Green, 2002, Hidden Markov models and disease mapping, J. Am. Statist. Ass., 10.1198/016214502388618870

Hodges, Counting degrees of freedom in hierarchical and other richly-parameterised models, Biometrika, 88, 367, 10.1093/biomet/88.2.367

Holmes, 1999, Bayesian wavelet analysis with a model complexity prior, Bayesian Statistics 6, 769, 10.1093/oso/9780198504856.003.0037

Kass, Bayes factors and model uncertainty, J. Am. Statist. Ass., 90, 773, 10.1080/01621459.1995.10476572

Key, 1999, Bayesian Statistics 6, 343, 10.1093/oso/9780198504856.003.0015

King, 2001, Bayesian model discrimination in the analysis of capture-recapture and related data

King, 2001, Bayesian estimation of census undercount, Biometrika, 88, 317, 10.1093/biomet/88.2.317

Konishi, 1996, Generalised information criteria in model selection, Biometrika, 83, 875, 10.1093/biomet/83.4.875

Lauritzen, 1988, Local computations with probabilities on graphical structures and their application to expert systems (with discussion), J. R. Statist. Soc., 50, 157

Lawson, 2000, Cluster modelling of disease incidence via rjmcmc methods: a comparative evaluation, Statist. Med., 19, 2361, 10.1002/1097-0258(20000915/30)19:17/18<2361::AID-SIM575>3.0.CO;2-N

Lee, 1996, Hierarchical generalized linear models (with discussion), J. R. Statist. Soc., 58, 619

2001, Hierarchical generalized linear models: a synthesis of generalized linear models, random effect models and structured dispersions, Biometrika, 88, 987, 10.1093/biomet/88.4.987

2001, Modelling and analysing correlated non-normal data, Statist. Modlng, 1, 3, 10.1177/1471082X0100100102

Luna, 2003, Choosing a model selection strategy, Scand. J. Statist.

Madigan, 1991, Model selection and accounting for model uncertainty in graphical models using Occam's window

McKeague, 2002, Spatial Cluster Modelling

Meng, Performing likelihood ratio tests with multiply imputed data sets, Biometrika, 79, 103, 10.1093/biomet/79.1.103

Moreno, 1998, Decision Research from Bayesian Approaches to Normative Systems

Neter, 1996, Applied Linear Statistical Models

Pericchi, 1991, Robust Bayesian credible intervals and prior ignorance, Int. Statist. Rev., 58, 1, 10.2307/1403571

Plummer, 2002, Some criteria for Bayesian model choice

Priestley, 1981, Spectral Analysis and Time Series

Robert, 1996, Intrinsic loss functions, Theory Decsn, 40, 191, 10.1007/BF00133173

Shao, 1997, An asymptotic theory for linear model selection, Statist. Sin., 7, 221

Skouras, 1999, On efficient probability forecasting systems, Biometrika, 86, 765, 10.1093/biomet/86.4.765

2000, Consistency in misspecified models

Smith, 1996, Bayesian Statistics 5, 387, 10.1093/oso/9780198523567.003.0020

Stone, 1974, Cross-validatory choice and assessment of statistical predictions (with discussion), J. R. Statist. Soc., 36, 111

1977, An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion, J. R. Statist. Soc., 36, 44

Vehtari, 2001, Bayesian model assessment and selection using expected utilities

Vehtari, 2002, Bayesian model assessment and comparison using cross-validation predictive densities, Neur. Computn, 14

2002, Cross-validation, information criteria, expected utilities and the effective number of parameters

Volinsky, 2000, Bayesian information criterion for censored survival models, Biometrics, 56, 256, 10.1111/j.0006-341X.2000.00256.x

Weisberg, 1981, A statistic for allocating Cp to individual cases, Technometrics, 23, 27

Ye, On measuring and correcting the effects of data mining and model selection, J. Am. Statist. Ass., 93, 120, 10.1080/01621459.1998.10474094

Zhu, Comparing hierarchical models for spatio-temporally misaligned data using the deviance information criterion, Statist. Med., 19, 2265, 10.1002/1097-0258(20000915/30)19:17/18<2265::AID-SIM568>3.0.CO;2-6