A generalized Hosmer–Lemeshow goodness-of-fit test for a family of generalized linear models

TEST - Trang 1-20 - 2023
Nikola Surjanovic1, Richard A. Lockhart2, Thomas M. Loughin2
1Department of Statistics, University of British Columbia, Vancouver, Canada
2Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada

Tóm tắt

Generalized linear models (GLMs) are very widely used, but formal goodness-of-fit (GOF) tests for the overall fit of the model seem to be in wide use only for certain classes of GLMs. We develop and apply a new goodness-of-fit test, similar to the well-known and commonly used Hosmer–Lemeshow (HL) test, that can be used with a wide variety of GLMs. The test statistic is a variant of the HL statistic, but we rigorously derive an asymptotically correct sampling distribution using methods of Stute and Zhu (Scand J Stat 29(3):535–545, 2002) and demonstrate its consistency. We compare the performance of our new test with other GOF tests for GLMs, including a naive direct application of the HL test to the Poisson problem. Our test provides competitive or comparable power in various simulation settings and we identify a situation where a naive version of the test fails to hold its size. Our generalized HL test is straightforward to implement and interpret and an R package is publicly available.

Tài liệu tham khảo

Agresti A (1996) An introduction to categorical data analysis. Wiley, New York Bilder CR, Loughin TM (2014) Analysis of categorical data with R. Chapman and Hall/CRC, Boston Blizzard L, Hosmer DW (2006) Parameter estimation and goodness-of-fit in log binomial regression. Biom J 48(1):5–22 Canary JD (2013) Grouped goodness-of-fit tests for binary regression models. PhD thesis, University of Tasmania Canary JD, Blizzard L, Barry RP, Hosmer DW, Quinn SJ (2016) Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions. Biom J 58(3):674–690 Cheng KF, Wu JW (1994) Testing goodness of fit for a parametric family of link functions. J Am Stat Assoc 89(426):657–664 Christensen R, Lin Y (2015) Lack-of-fit tests based on partial sums of residuals. Commun Stat Theory Methods 44(13):2862–2880 Fagerland MW, Hosmer DW (2013) A goodness-of-fit test for the proportional odds regression model. Stat Med 32(13):2235–2249 Fagerland MW, Hosmer DW (2016) Tests for goodness of fit in ordinal logistic regression models. J Stat Comput Simul 86(17):3398–3418 Fagerland MW, Hosmer DW, Bofin AM (2008) Multinomial goodness-of-fit tests for logistic regression models. Stat Med 27(21):4238–4253 González-Manteiga W, Crujeiras RM (2013) An updated review of goodness-of-fit tests for regression models. TEST 22(3):361–411 Halteman WA (1980) A goodness of fit test for binary logistic regression. Unpublished doctoral dissertation, Department of Biostatistics, University of Washington, Seattle, WA Hosmer DW, Hjort NL (2002) Goodness-of-fit processes for logistic regression: simulation results. Stat Med 21(18):2723–2738 Hosmer DW, Lemeshow S (1980) Goodness of fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9(10):1043–1069 Lin DY, Wei LJ, Ying Z (2002) Model-checking techniques based on cumulative residuals. Biometrics 58(1):1–12 Liu A, Meiring W, Wang Y (2004) Testing generalized linear models using smoothing spline methods. Stat Sin 15:235–256 Moore DS, Spruill MC (1975) Unified large-sample theory of general chi-squared statistics for tests of fit. Ann Stat 3:599–616 Pulkstenis E, Robinson TJ (2002) Two goodness-of-fit tests for logistic regression models with continuous covariates. Stat Med 21(1):79–93 Quinn SJ, Hosmer DW, Blizzard CL (2015) Goodness-of-fit statistics for log-link regression models. J Stat Comput Simul 85(12):2533–2545 Rodríguez-Campos MC, González-Manteiga W, Cao R (1998) Testing the hypothesis of a generalized linear regression model using nonparametric regression estimation. J Stat Plan Inference 67(1):99–122 Stute W, Zhu L-X (2002) Model checks for generalized linear models. Scand J Stat 29(3):535–545 Su JQ, Wei LJ (1991) A lack-of-fit test for the mean function in a generalized linear model. J Am Stat Assoc 86(414):420–426 Surjanovic N, Loughin TM (2021) Improving the Hosmer–Lemeshow goodness-of-fit test in large models with replicated trials. arXiv preprint arXiv:2102.12698 Tsiatis AA (1980) A note on a goodness-of-fit test for the logistic regression model. Biometrika 67(1):250–251 White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–25 Xiang D, Wahba G (1995) Testing the generalized linear model null hypothesis versus ‘smooth’ alternatives. Technical Report 953, Department of Statistics, University of Wisconsin