Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models

Simon N. Wood1
1University of Bath, Bath, UK

Tóm tắt

Summary Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton–Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike’s information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike’s information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike’s information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.

Từ khóa


Tài liệu tham khảo

Anderson, 1999, LAPACK Users’ Guide, 10.1137/1.9780898719604

Anderssen, 1974, A time series approach to numerical differentiation, Technometrics, 16, 69, 10.1080/00401706.1974.10489151

Breslow, 1993, Approximate inference in generalized linear mixed models, J. Am. Statist. Ass., 88, 9

Brezger, 2007, BayesX 1.5.0

Brezger, 2006, Generalized structured additive regression based on Bayesian P-splines, Computnl Statist. Data Anal., 50, 967, 10.1016/j.csda.2004.10.011

Cline, 1979, An estimate for the condition number of a matrix, SIAM J. Numer. Anal., 13, 293, 10.1137/0713027

Craven, 1979, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation, Numer. Math., 31, 377, 10.1007/BF01404567

Davison, 2003, Statistical Models, 10.1017/CBO9780511815850

Demidenko, 2004, Mixed Models: Theory and Applications, 10.1002/0471728438

Dunn, 2005, Series evaluation of Tweedie exponential dispersion model densities, Statist. Comput., 15, 267, 10.1007/s11222-005-4070-y

Efron, 1978, Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information, Biometrika, 65, 457, 10.1093/biomet/65.3.457

Eilers, 1996, Flexible smoothing with B-splines and penalties, Statist. Sci., 11, 89, 10.1214/ss/1038425655

Eilers, 2002, Generalized linear additive smooth structures, J. Computnl Graph. Statist., 11, 758, 10.1198/106186002844

Escabias, 2004, Principal component estimation of functional logistic regression: discussion of two different approaches, Nonparam. Statist., 16, 365, 10.1080/10485250310001624738

Fahrmeir, 2004, Penalized structured additive regression for space time data: a Bayesian perspective, Statist. Sin., 14, 731

Fahrmeir, 2001, Bayesian inference for generalized additive mixed models based on Markov random field priors, Appl. Statist., 50, 201

Golub, 1996, Matrix Computations

Green, 1994, Nonparametric Regression and Generalized Linear Models, 10.1007/978-1-4899-4473-3

Gu, 1992, Cross validating non-Gaussian data, J. Computnl Graph. Statist., 1, 169

Gu, 2002, Smoothing Spline ANOVA Models, 10.1007/978-1-4757-3683-0

Gu, 2002, Penalized likelihood regression: general formulation and efficient approximation, Can. J. Statist., 30, 619, 10.2307/3316100

Hall, 2005, Theory for penalised spline regression, Biometrika, 92, 105, 10.1093/biomet/92.1.105

Härdle, 1988, How far are automatically chosen regression smoothing parameters from their optimum?, J. Am. Statist. Ass., 83, 86

Harville, 1977, Maximum likelihood approaches to variance component estimation and to related problems, J. Am. Statist. Ass., 72, 320, 10.1080/01621459.1977.10480998

Harville, 1997, Matrix Algebra from a Statistician’s Perspective, 10.1007/b98818

Hastie, 1986, Generalized additive models (with discussion), Statist. Sci., 1, 297

Hastie, 1990, Generalized Additive Models

Hastie, 1993, Varying-coefficient models (with discussion), J. R. Statist. Soc. B, 55, 757

Hurvich, 1998, Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, J. R. Statist. Soc. B, 60, 271, 10.1111/1467-9868.00125

Kalivas, 1997, Two data sets of near infrared spectra, Chemometr. Intell. Lab. Syst., 37, 255, 10.1016/S0169-7439(97)00038-5

Kauermann, 2005, A note on smoothing parameter selection for penalized spline smoothing, J. Statist. Planng Inf., 127, 53, 10.1016/j.jspi.2003.09.023

Kauermann, 2009, Some asymptotic results on generalized penalized spline smoothing, J. R. Statist. Soc. B, 71, 487, 10.1111/j.1467-9868.2008.00691.x

Kimeldorf, 1970, A correspondence between Bayesian estimation of stochastic processes and smoothing by splines, Ann. Math. Statist., 41, 495, 10.1214/aoms/1177697089

Kohn, 1991, The performance of cross-validation and maximum likelihood estimators of spline smoothing parameters, J. Am. Statist. Ass., 86, 1042, 10.1080/01621459.1991.10475150

Krivobokova, 2008, Fast adaptive penalized splines, J. Computnl Graph. Statist., 17, 1, 10.1198/106186008X287328

Laird, 1982, Random-effects models for longitudinal data, Biometrics, 38, 963, 10.2307/2529876

Lang, 2004, Bayesian P-splines, J. Computnl Graph. Statist., 13, 183, 10.1198/1061860043010

Marx, 1998, Direct generalized additive modeling with penalized likelihood, Computnl Statist. Data Anal., 28, 193, 10.1016/S0167-9473(98)00033-4

Marx, 1999, Generalized linear regression on sampled signals and curves: a P-spline approach, Technometrics, 41, 1, 10.1080/00401706.1999.10485591

Monahan, 2001, Numerical Methods of Statistics, 10.1017/CBO9780511812231

Nelder, 1972, Generalized linear models, J. R. Statist. Soc. A, 135, 370, 10.2307/2344614

Nocedal, 2006, Numerical Optimization

Parker, 1985, Discussion on ‘Some aspects of the spline smoothing approach to non-parametric regression curve fitting’ (by B. W. Silverman), J. R. Statist. Soc. B, 47, 40

Patterson, 1971, Recovery of interblock information when block sizes are unequal, Biometrika, 58, 545, 10.1093/biomet/58.3.545

Ramsay, 2005, Functional Data Analysis, 10.1007/b98888

R Development Core Team, 2008, R 2.8.1: a Language and Environment for Statistical Computing

Reiss, 2007, Functional principal component regression and functional partial least squares, J. Am. Statist. Ass., 102, 984, 10.1198/016214507000000527

Reiss, 2009, Smoothing parameter selection for a class of semiparametric linear models, J. R. Statist. Soc. B, 71, 505, 10.1111/j.1467-9868.2008.00695.x

Rue, 2009, Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion), J. R. Statist. Soc. B, 71, 319, 10.1111/j.1467-9868.2008.00700.x

Ruppert, 2003, Semiparametric Regression, 10.1017/CBO9780511755453

Silverman, 1985, Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion), J. R. Statist. Soc. B, 47, 1

Tutz, 2006, Generalized additive modeling with implicit variable selection by likelihood-based boosting, Biometrics, 62, 961, 10.1111/j.1541-0420.2006.00578.x

Tweedie, 1984, Statistics: Applications and New Directions: Proc. Indian Statistical Institute Golden Jubilee Int. Conf, 579

Venables, 2002, Modern Applied Statistics with S, 10.1007/978-0-387-21706-2

Wahba, 1980, Approximation Theory III

Wahba, 1983, Bayesian ‘‘confidence intervals’’ for the cross-validated smoothing spline, J. R. Statist. Soc. B, 45, 133

Wahba, 1985, A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem, Ann. Statist., 13, 1378, 10.1214/aos/1176349743

Wahba, 1990, Spline Models for Observational Data, 10.1137/1.9781611970128

Wahba, 1975, A completely automatic French curve: fitting spline functions by cross-validation, Communs Statist. Theor. Meth., 4, 125

Watkins, 1991, Fundamentals of Matrix Computations

Wehrens, 2007, R Package Version 2.1-0

Wood, 2003, Thin plate regression splines, J. R. Statist. Soc. B, 65, 95, 10.1111/1467-9868.00374

Wood, 2004, Stable and efficient multiple smoothing parameter estimation for generalized additive models, J. Am. Statist. Ass., 99, 673, 10.1198/016214504000000980

Wood, 2006, Generalized Additive Models: an Introduction with R, 10.1201/9781420010404

Wood, 2008, Fast stable direct fitting and smoothness selection for generalized additive models, J. R. Statist. Soc. B, 70, 495, 10.1111/j.1467-9868.2007.00646.x