Inference on finite population categorical response: nonparametric regression-based predictive approach
Tóm tắt
Suppose that a finite population consists of N distinct units. Associated with the ith unit is a polychotomous response vector, d
i
, and a vector of auxiliary variable x
i
. The values x
i
’s are known for the entire population but d
i
’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector P. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint.
Tài liệu tham khảo
Abe, M.: A generalized additive model for discrete choice data. J. Bus. Econ. Stat. 17, 271–284 (1999)
Adhya, S., Banerjee, T., Chattopadhyay, G.: Inference on polychotomous responses in finite population: a predictive approach. (Revised version submitted after minor revision to Scandinavian Journal of Statistics) (2010)
Bester, C.A., Hansen, C.: Bias reduction for Bayesian and frequentist estimators (2006)
Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Ann. Stat. 28, 1026–1053 (2000)
Breidt, F.J., Opsomer, J.D.: Nonparametric and semiparametric estimation in complex surveys. In: Pfeffermann, D., Rao, C.R. (eds.) Handbook of Statistics—Sample Surveys: Inference and Analysis, vol. 29B, pp. 103–120. North Holland, Amsterdam (2009)
Breidt, F.J., Opsomer, J.D., Claeskens, G.: Model-assisted estimation of complex surveys using penalized splines. Biometrika 92, 831–846 (2005)
Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)
Cassel, C.M., Sarndal, C.E., Wretman, J.H.: Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63, 615–620 (1976)
Chambers, R.L., Dorfman, A.H., Hall, P.: Properties of estimators of the finite population distribution function. Biometrika 79, 577–582 (1992)
Chambers, R.L., Dorfman, A.H., Wehrly, T.E.: Bias robust estimation in finite population using nonparametric calibration. J. Am. Stat. Assoc. 88, 268–277 (1993)
Chen, Q., Ibahim, J.G.: Semiparametric models for missing covariate and response data in regression models. Biometrics 62, 177–184 (2006)
Claeskens, G., Aerts, M., Molenberghs, G.: A quadratic bootstrap method and improved estimation in logistic regression. Stat. Probab. Lett. 61, 383–394 (2002)
Cox, D.R., Reid, N.: Parameter orthogonality and approximate conditional inference. J. R. Stat. Soc., Ser. B 49, 1–49 (1987)
Cox, D.R., Snell, E.J.: A general definition of residuals. J. R. Stat. Soc., Ser. B 30, 248–275 (1968)
Crainiceanu, C.M., Ruppert, D.: Likelihood ratio tests in linear mixed models with one variance component. J. R. Stat. Soc., Ser. B 66, 165–185 (2004)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977)
Dorfman, A.H., Hall, P.: Estimators of finite population distribution function using nonparametric regression. Ann. Stat. 21, 1452–1475 (1993)
Fahmeir, L., Tutz, G.: Multivariate Statistical Modeling Based on Generalized Linear Models. Springer, New York (2001)
Fahrmeir, L., Kneib, T., Lang, S.: Penalized structured additive regression: a Bayesian perspective. Stat. Sin. 14, 731–761 (2004)
French, J.L., Wand, M.P.: Generalized additive models for cancer mapping with incomplete covariates. Biostatistics 5, 177–191 (2004)
Geoman, J.J., Le Cessie, S.: A goodness-of-fit test for multinomial logistic regression. Biometrics 62, 980–995 (2006)
Green, P.J.: Penalized likelihood for general semiparametric regression models. Int. Stat. Rev. 55, 245–259 (1987)
Hartzel, J., Agresti, A., Caffo, B.: Multinomial logit random effects models. Stat. Model. 1, 81–102 (2001)
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, London (1990)
Kass, R.E., Steffy, D.: Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Am. Stat. Assoc. 84, 717–726 (1989)
Kneib, T., Fahrmeir, L.: Structured additive regression for multicategorical space-time data: a mixed model approach. Discussion Paper 377, SFB 386, Ludwig Maximilians University Munich (2004)
Kneib, T., Fahrmeir, L.: Structured additive regression for categorical space-time data: a mixed model approach. Biometrics 62, 109–118 (2006)
Kneib, T., Baumgartner, B., Steiner, W.J.: Semiparametric multinomial logit models for analysing consumer choice behaviour. AStA Adv. Stat. Anal. 91, 225–244 (2007)
Kuk, A.Y.C.: A kernel method of estimating finite population distribution function using auxiliary information. Biometrika 80, 385–392 (1993)
Kuo, L.: Classical and prediction approaches to estimating distribution functions from survey data. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 280–285 (1988)
Lombardia, M.J., Gonzalez-Manteiga, W., Prada-Sanchez, J.M.: Bootstrapping the Chamberts–Dunstan estimate of a finite population distribution function. J. Stat. Plan. Inference 106, 367–388 (2003)
Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc., Ser. B 44, 226–233 (1982)
Lee, Y., Nelder, J.A.: Hierarchical generalized linear models (with discussion). J. R. Stat. Soc., Ser. B 58, 619–678 (1996)
Lee, Y., Nelder, J.A.: Hierarchical generalized linear models: a synthesis of generalized linear models, random effect models and structured discussions. Biometrika 88, 987–1006 (2001)
Little, R.J.A., Zheng, H.: Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Surv. Methodol. 30, 209–218 (2004)
Montanari, G.E., Ranalli, M.G.: Nonparametric model calibration estimation in survey sampling. J. Am. Stat. Assoc. 100, 1429–1442 (2005)
Noh, M., Lee, Y.: REML estimation for binary data in GLMMs. J. Multivar. Anal. 98, 896–915 (2007)
Opsomer, J.D., Breidt, F.J., Moisen, G.G., Kauermann, G.: Model-assisted estimation of forest resources with generalized additive models. J. Am. Stat. Assoc. 102, 400–409 (2007)
Randles, R.H.: On the asymptotic normality of statistics with estimated parameters. Ann. Stat. 10, 462–474 (1982)
Royall, R.M.: On finite population sampling theory under certain linear regression model. Biometrika 57, 377–387 (1970)
Royall, R.M.: The linear least-square prediction approach to two-stage sampling. J. Am. Stat. Assoc. 71, 657–664 (1976)
Ruppert, D.: Selecting the number of knots for penalized spline. J. Comput. Graph. Stat. 11, 735–757 (2002)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, New York (2003)
Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, Singapore (1976)
Sarndal, C.E.: On π-inverse weighting verses best linear unbiased weighting in probability sampling. Biometrika 67, 639–650 (1980)
Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer, New York (1996)
Steele, B.M.: A modified EM algorithm for estimation in generalized mixed models. Biometrics 52, 1295–1310 (1996)
Tierney, L., Kass, R.E., Kadane, J.B.: Fully exponential Laplace approximations to expectations and variances of nonpositive functions. J. Am. Stat. Assoc. 84, 710–716 (1989)
Tutz, G., Scholz, T.: Semiparametric modeling of multicategorical data. J. Stat. Comput. Simul. 74, 183–200 (2004)
Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990)
Wu, C., Sitter, R.R.: A model-calibration approach to using auxiliary information from survey data. J. Am. Stat. Assoc. 96, 185–193 (2001)
Yee, T.W., Wild, C.J.: Vector generalized additive models. J. R. Stat. Soc., Ser. B 58, 481–493 (1996)
Yu, Y., Ruppert, D.: Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 97, 1042–1054 (2002)
Yu, Y.: Penalized spline estimation in generalized partially linear single-index models. Technical Report, College of Business, University of Cincinnciti (2008)
Zheng, H., Little, R.J.A.: Penalized spline model-based estimation of the finite population total from probability-proportional-to-size samples. J. Off. Stat. 21, 1–20 (2005)