Penalized regression with individual deviance effects

Computational Statistics - Tập 25 - Trang 341-361 - 2009
Aris Perperoglou1,2, Paul H. C. Eilers2
1Department of Statistics and Actuarial Financial Mathematics, University of the Aegean, Samos, Greece
2Department of Biostatistics, Erasmus Medical Center, Rotterdam, The Netherlands

Tóm tắt

The present work addresses the problem of model estimation and computations for discrete data when some covariates are modeled smoothly using splines. We propose to introduce and explicitly estimate individual deviance effects (one for each observation), constrained by a ridge penalty. This turns out to be an effective way to absorb model excess variation and detect systematic patterns. Large but very sparse systems of penalized likelihood equations have to be solved. We present fast and compact algorithms for fitting, estimation and computation of the effective dimension. Applications to counts, binomial, and survival data illustrate practical use of this model.

Tài liệu tham khảo

Agresti A (1996) An introduction to categorical data analysis. Wiley, New York Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC- 19: 716–723 Bissell AF (1972) A negative binomial model with varying elements sizes. Biometrika 59: 435–441 Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88(421): 9–25 Collet D (2003) Modeling binary data. Chapman and Hall/CRC, London Crowder M (1978) Beta-binomial ANOVA for proportions. Appl Stat 27: 34–37 Efron B (1986) Double exponential families and their use in generalized linear regression. J Am Stat Assoc 81: 709–721 Efron B (1988) Logistic regression, survival analysis, and the Kaplan Meier curve. J Am Stat Assoc 83(402): 414–425 Eilers P, Gampe J, Marx B, Rau R (2008) Modulation models for seasonal time series and incidence tables. Stat Med 27: 3430–3441 Eilers PHC, Borgdorff MW (2004) Modeling and correction of digit preference in tuberculin surveys. Int J Tuberc Lung Dis 8(2): 232–239 Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11(2): 89–121 Goldstein H, Spiegelhalter DJ (1996) League tables and their limitations: statistical issues in comparisons of institutional performance. J R Stat Soc A 156: 385–409 Green P, Silverman B (1993) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall, London Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrica 76: 297–307 Hurvich CM, Simonof JS, Tsai CL (1998) Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J R Stat Soc B 60: 271–293 Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London Hinde J, Demetrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal 27: 151–170 Hinde JP (1982) Compound Poisson regression models. In: Gilchrist R (eds) GLIM82. Springer, New York, pp 109–121 Lee Y, Nelder JA (1996) Hierarhical generalized linear models. J R Stat Soc B 58(4): 619–678 Lee Y, Nelder JA (2001) Hierarhical generalized linear models: a synthesis of generalized linear models, random effects models and structured dispersions. Biometrika 88(4): 987–1006 Cessie S, van Houwelingen HC (1995) Testing the fit of a regression model via score tests in random effects models. Biometrics 51: 600–614 Lin X, Zhang D (1999) Inference in generalized additive mixed models by using smoothing splines. J R Stat Soc B 61(2): 381–400 McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, London Morgan BJT (1992) Analysis of quantal response data. Chapman and Hall, London Müller H-G, Wang J-L, Capra WB (1997) From lifetables to hazard rates: the transformation approach. Biometrika 84(4): 881–892 Pawitan Y (2001) In all likelihood: statistical modellind and inference using likelihood. Oxford Science Publications, Oxford Pinheiro J, Bates D (2000) Mixed effects models in S and S-plus. Springer, New York Schall R (1991) Estimation in generalized linear models with random effects. Biometrika 78(4): 719–727 Spiegelhalter DJ (1999) Surgical audit: statistical lessons from nightingale and codman. J R Stat Soc A 162: 45–58 Thomas N, Longford NT, Rolph JE (1994) Empirical bayes methods for estimating hospital-specific mortality-rates. Stat Med 13: 889–903 Thurston SW, Wand MP, Wiencke JK (2000) Negative binomial additive models. Biometrics 56: 139–144 van Houwelingen HC, Brand R, Louis TA (2004) Empirical bayes methods for monitoring health care quality. Technical report, Department of Medical Statistics, LUMC Williams DA (1982) Extra binomial variation in logistic linear models. Appl Stat 31: 144–148 Wood S (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J R Stat Soc B 70: 495–518