Prediction in Multilevel Generalized Linear Models

Anders Skrondal1, Sophia Rabe‐Hesketh2,3
1Norwegian Institute of Public Health, Oslo, Norway
2Institute of Education, London, UK
3University of California, Berkeley, USA, and Institute of Education London,UK

Tóm tắt

Summary

We discuss prediction of random effects and of expected responses in multilevel generalized linear models. Prediction of random effects is useful for instance in small area estimation and disease mapping, effectiveness studies and model diagnostics. Prediction of expected responses is useful for planning, model interpretation and diagnostics. For prediction of random effects, we concentrate on empirical Bayes prediction and discuss three different kinds of standard errors; the posterior standard deviation and the marginal prediction error standard deviation (comparative standard errors) and the marginal sampling standard deviation (diagnostic standard error). Analytical expressions are available only for linear models and are provided in an appendix. For other multilevel generalized linear models we present approximations and suggest using parametric bootstrapping to obtain standard errors. We also discuss prediction of expectations of responses or probabilities for a new unit in a hypothetical cluster, or in a new (randomly sampled) cluster or in an existing cluster. The methods are implemented in gllamm and illustrated by applying them to survey data on reading proficiency of children nested in schools. Simulations are used to assess the performance of various predictions and associated standard errors for logistic random-intercept models under a range of conditions.

Từ khóa


Tài liệu tham khảo

Adams, 2002, PISA 2000 Technical Report, 99

Afshartous, 2005, Prediction in multilevel models, J. Educ. Behav. Statist., 30, 109, 10.3102/10769986030002109

Afshartous, 2007, Avoiding ‘data snooping’ in multilevel and mixed effects models, J. R. Statist. Soc. A, 170, 1035, 10.1111/j.1467-985X.2007.00494.x

Bartlett, 1938, Methods of estimating mental factors, Nature, 141, 609

Bernardo, 1994, Bayesian Theory, 10.1002/9780470316870

Bock, 1981, Marginal maximum likelihood estimation of item parameters: application of an EM algorithm, Psychometrika, 46, 443, 10.1007/BF02293801

Bock, 1982, Adaptive EAP estimation of ability in a microcomputer environment, Appl. Psychol. Measmnt, 6, 431, 10.1177/014662168200600405

Bondeson, 1990, Prediction in random coefficient regression models, Biometr. J., 32, 387, 10.1002/bimj.4710320402

Booth, 1998, Standard errors of prediction in generalized linear mixed models, J. Am. Statist. Ass., 93, 262, 10.1080/01621459.1998.10474107

Breslow, 1993, Approximate inference in generalized linear mixed models, J. Am. Statist. Ass., 88, 9

Candel, 2004, Performance of empirical bayes estimators of random coefficients in multilevel analysis: some results for the random intercept-only model, Statist. Neerland., 58, 197, 10.1046/j.0039-0402.2003.00256.x

Candel, 2007, Empirical bayes estimators of the random intercept in multilevel analysis: performance of the classical, Morris and Rao version, Computnl Statist. Data Anal., 51, 3027, 10.1016/j.csda.2006.01.017

Carlin, 2000, Bayes and Empirical Bayes Methods for Data Analysis

Carlin, 2000, Empirical Bayes: past, present and future, J. Am. Statist. Ass., 95, 1286, 10.1080/01621459.2000.10474331

Chamberlain, 1984, Handbook of Econometrics, vol. II, 1247

Chang, 1993, The asymptotic posterior normality of the latent trait in an IRT model, Psychometrika, 58, 37, 10.1007/BF02294469

Clayton, 1996, Markov Chain Monte Carlo in Practice, 275

Clayton, 1987, Empirical Bayes estimates of age-standardized relative risks for use in disease mapping, Biometrics, 43, 671, 10.2307/2532003

Deely, 1981, Bayes empirical Bayes, J. Am. Statist. Ass., 76, 833, 10.1080/01621459.1981.10477731

Demidenko, 2004, Mixed Models: Theory and Applications, 10.1002/0471728438

Duchateau, 2005, Understanding heterogeneity in mixed, generalized mixed and frailty models, Am. Statistn, 59, 143, 10.1198/000313005X43236

Efron, 1973, Stein’s estimation rule and its competitors—an empirical Bayes approach, J. Am. Statist. Ass., 68, 117

Efron, 1975, Data analysis using Stein’s estimator and its generalizations, J. Am. Statist. Ass., 70, 311, 10.1080/01621459.1975.10479864

Embretson, 2000, Item Response Theory for Psychologists

Farrell, 1997, Bootstrap adjustments for empirical Bayes interval estimates of small-area proportions, Can. J. Statist., 25, 75, 10.2307/3315358

Fearn, 1975, A Bayesian approach to growth curves, Biometrika, 62, 89, 10.1093/biomet/62.1.89

Frees, 2006, Multilevel model prediction, Psychometrika, 71, 79, 10.1007/s11336-003-1108-y

Ganzeboom, 1992, A standard international socio-economic index of occupational status, Socl Sci. Res., 21, 1, 10.1016/0049-089X(92)90017-B

Gibbons, 1994, A random-effects probit model for predicting medical malpractice claims, J. Am. Statist. Ass., 89, 760, 10.1080/01621459.1994.10476809

Goldberger, 1962, Best linear unbiased prediction in the generalized linear regression model, J. Am. Statist. Ass., 57, 369, 10.1080/01621459.1962.10480665

Goldstein, 1995, Multilevel Statistical Models

Goldstein, 2003, Multilevel Statistical Models

Goldstein, 1996, League tables and their limitations: statistical issues in comparisons of institutional performance, J. R. Statist. Soc. A, 159, 385, 10.2307/2983325

Hall, 2006, J. R. Statist. Soc. B, 68, 221, 10.1111/j.1467-9868.2006.00541.x

Harville, 1976, Extension of the Gauss-Markov theorem to include the estimation of random effects, Ann. Statist., 2, 384

Hoijtink, 1995, Rasch Models: Foundations, Recent Developments, and Applications, 53, 10.1007/978-1-4612-4230-7_4

Jiang, 2007, Linear and Generalized Linear Mixed Models and Their Applications

Jiang, 2001, Empirical best prediction for small area inference with binary data, Ann. Inst. Statist. Math., 53, 217, 10.1023/A:1012410420337

Kackar, 1984, Approximations for standard errors of estimators of fixed and random effects in mixed linear models, J. Am. Statist. Ass., 79, 853

Kass, 1989, Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models), J. Am. Statist. Ass., 84, 717, 10.1080/01621459.1989.10478825

Laird, 1987, Empirical Bayes confidence intervals based on bootstrap samples (with discussion), J. Am. Statist. Ass., 82, 739, 10.1080/01621459.1987.10478490

Laird, 1982, Random effects models for longitudinal data, Biometrics, 38, 963, 10.2307/2529876

Lange, 1989, Assessing normality in random effects models, Ann. Statist., 17, 624, 10.1214/aos/1176347130

Langford, 1998, Outliers in multilevel data (with discussion), J. R. Statist. Soc. A, 161, 121, 10.1111/1467-985X.00094

Lawley, 1971, Factor Analysis as a Statistical Method

Lindley, 1972, Bayes estimates for the linear model (with discussion), J. R. Statist. Soc. B, 34, 1

Longford, 2001, Simulation-based diagnostics in random-coefficient models, J. R. Statist. Soc. A, 164, 259, 10.1111/1467-985X.00201

Louis, 1984, Bayes and empirical Bayes estimates of a population of parameter values, J. Am. Statist. Ass., 79, 393, 10.1080/01621459.1984.10478062

Ma, 2008, Multilevel Modelling of Educational Data, 59

Maritz, 1989, Empirical Bayes Methods

McCulloch, 1997, Maximum likelihood algorithms for generalized linear mixed models, J. Am. Statist. Ass., 92, 162, 10.1080/01621459.1997.10473613

McCulloch, 2007, Prediction of random effects and effects of misspecification of their distribution

McCulloch, 2008, Generalized, Linear and Mixed Models

Mislevy, 1986, Recent developments in the factor analysis of categorical variables, J. Educ. Statist., 11, 3, 10.3102/10769986011001003

Morris, 1983, Parametric empirical Bayes inference: theory and applications, J. Am. Statist. Ass., 78, 47, 10.1080/01621459.1983.10477920

Organisation for Economic Co-operation and Development, 2000, Manual for the PISA 2000 Database

Pinheiro, 1995, Approximations to the log-likelihood function in the nonlinear mixed-effects model, J. Computnl Graph. Statist., 4, 12

Rabe-Hesketh, 2003, Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation, Statist. Modllng, 3, 215, 10.1191/1471082X03st056oa

Rabe-Hesketh, 2006, Multilevel modelling of complex survey data, J. R. Statist. Soc. A, 169, 805, 10.1111/j.1467-985X.2006.00426.x

Rabe-Hesketh, 2008, Longitudinal Data Analysis, 79

Rabe-Hesketh, 2008, Multilevel and Longitudinal Modeling using Stata

Rabe-Hesketh, 2004, Generalized multilevel structural equation modeling, Psychometrika, 69, 167, 10.1007/BF02295939

Rabe-Hesketh, 2005, Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects, J. Econometr., 128, 301, 10.1016/j.jeconom.2004.08.017

Rao, 1975, Simultaneous estimation of parameters in different linear models and applications to biometric problems, Biometrics, 31, 545, 10.2307/2529436

Rao, 2003, Small Area Estimation, 10.1002/0471722189

Raudenbush, 2002, Hierarchical Linear Models

Raudenbush, 1995, Estimation of school effects, J. Educ. Behav. Statist., 20, 307, 10.2307/1165304

Reinsel, 1984, Estimation and prediction in a multivariate random effects generalized linear model, J. Am. Statist. Ass., 79, 406, 10.1080/01621459.1984.10478064

Reinsel, 1985, Mean squared error properties of empirical Bayes estimators in a multivariate random effects general linear model, J. Am. Statist. Ass., 80, 642, 10.1080/01621459.1985.10478164

Robbins, 1955, Proc. 3rd Berkeley Symp. Mathematical Statistics and Probability, 157

Robinson, 1991, That BLUP is a good thing: the estimation of random effects, Statist. Sci., 6, 15

Rose, 2006, A multilevel approach to individual tree survival prediction, For. Sci., 52, 31

Rosenberg, 1973, Linear regression with randomly dispersed parameters, Biometrika, 60, 65, 10.1093/biomet/60.1.65

Rubin, 1980, Using empirical Bayes techniques in the law school validity studies, J. Am. Statist. Ass., 75, 801, 10.1080/01621459.1980.10477553

Rubin, 1984, Bayesianly justifiable and relevant frequency calculations for the applied statistician, Ann. Statist., 12, 1151, 10.1214/aos/1176346785

Rumberger, 2005, Does segregation still matter? The impact of student composition on academic achievement in high school, Teach. Coll. Rec., 107, 1999, 10.1177/016146810810700905

Schilling, 2005, High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature, Psychometrika, 70, 533

Searle, 1992, Variance Components, 10.1002/9780470316856

Skrondal, 1996, Latent Trait, Multilevel and Repeated Measurement Modelling with Incomplete Data of Mixed Measurement Levels

Skrondal, 2004, Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, 10.1201/9780203489437

Skrondal, 2007, Redundant overdispersion parameters in multilevel models, J. Educ. Behav. Statist., 32, 419, 10.3102/1076998607302629

Skrondal, 2007, Latent variable modelling: a survey, Scand. J. Statist., 34, 712, 10.1111/j.1467-9469.2007.00573.x

Smith, 1973, A general Bayesian linear model, J. R. Statist. Soc. B, 35, 67

Strenio, 1983, Empirical Bayes estimation of individual growth curve parameters and their relations to covariates, Biometrics, 39, 71, 10.2307/2530808

Swamy, 1970, Efficient inference in a random coefficient regression model, Econometrica, 38, 311, 10.2307/1913012

Ten Have, 1999, Empirical Bayes estimation of random effects parameters in mixed effects logistic regression models, Biometrics, 55, 1022, 10.1111/j.0006-341X.1999.01022.x

Thomson, 1938, The Factorial Analysis of Human Ability

Thurstone, 1935, The Vectors of Mind

Tsutakawa, 1990, The effect of uncertainty of item parameter estimation on ability estimates, Psychometrika, 55, 371, 10.1007/BF02295293

Vidoni, 2006, Response prediction in mixed effects models, J. Statist. Planng Inf., 136, 3948, 10.1016/j.jspi.2005.03.006

Vonesh, 1997, Linear and Nonlinear Models for the Analysis of Repeated Measurements

Ware, 1981, Tracking: prediction of future values from serial measurements, Biometrics, 37, 427, 10.2307/2530556

Warm, 1989, Weighted likelihood estimation of ability in item response models, Psychometrika, 54, 427, 10.1007/BF02294627

Willms, 1986, Social class segregation and its relationship to pupils’ examination results in Scotland, Am. Sociol. Rev., 51, 224, 10.2307/2095518