Variance estimation procedures in the presence of singly imputed survey data: a critical review
Tóm tắt
The problem of variance estimation in the presence of singly imputed data has attracted a lot of attention in the last three decades. Treating the imputed values as if they were observed may result in serious underestimation of the true variance of point estimates, leading to invalid inferences. In this paper, we review the approaches/methods proposed in the literature for obtaining variance estimates that account for sampling, nonresponse, and imputation. The advantages/drawbacks of each method are highlighted.
Tài liệu tham khảo
Antal, E., & Tillé, Y. (2011). A direct bootstrap method for complex sampling designs from a finite population. Journal of the American Statistical Association, 106, 534–543.
Beaumont, J.-F., Béliveau, A., & Haziza, D. (2015). Clarifying some aspects of variance estimation in two-phase sampling. Journal of Survey Statistics and Methodology, 3, 524–542.
Beaumont, J.-F., & Bissonnette, J. (2011). Variance estimation under composite imputation: The methodology behind SEVANI. Survey Methodology, 37, 171–179.
Beaumont, J.-F., & Bocci, C. (2009). Variance estimation when donor imputation is used to fill in missing values. Canadian Journal of Statistics, 37, 400–416.
Beaumont, J.-F., & Haziza, D. (2016). A note on the concept of invariance in two-phase sampling designs. Survey Methodology, 42, 319–323.
Beaumont, J.-F., Haziza, D., & Bocci, C. (2011). On variance estimation under auxiliary value imputation in sample surveys. Statistica Sinica, 21, 515–537.
Beaumont, J.-F., & Patak, Z. (2012). On the generalized bootstrap for sample surveys with special attention to poisson sampling. International Statistical Review, 80, 127–148.
Berger, Y. G. (1998). Rate of convergence for asymptotic variance of the Horvitz-Thompson estimator. Journal of Statistical Planning and Inference, 74, 149–168.
Berger, Y. G. (2007). A jackknife variance estimator for unistage stratified samples with unequal probabilities. Biometrika, 94, 953–964.
Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.
Berger, Y. G., & Escobar, E. L. (2017). Variance estimation of imputed estimators of change for repeated rotating surveys. International Statistical Review, 85, 421–438.
Berger, Y. G., & Rao, J. N. K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement. Journal of the Royal Statistical Society: Series B, 68, 531–547.
Berger, Y. G., & Skinner, C. J. (2005). A jackknife variance estimator for unequal probability sampling. Journal of the Royal Statistical Society: Series B, 67, 79–89.
Booth, J. G., Butler, R. W., & Hall, P. (1994). Bootstrap methods for finite populations. Journal of the American Statistical Association, 89, 1282–1289.
Brewer, K., & Donadio, M. E. (2003). The high entropy variance of the horvitz-thompson estimator. Survey Methodology, 29, 189–196.
Brick, J. M., Kalton, G., & Kim, J. K. (2004). Variance estimation with hot deck imputation using a model. Survey Methodology, 30, 57–66.
Campbell, C. (1980). A different view of finite population estimation. Proceedings of the Survey Research Methods Section, ASA, 1980, 319–324.
Chao, M. (1982). A general purpose unequal probability sampling plan. Biometrika, 69, 653–656.
Chauvet, G. (2007). Méthodes de bootstrap en population finie. PhD thesis, Université de Rennes 2.
Chen, S., & Haziza, D. (2019). Recent developments in dealing with item non-response in surveys: A critical review. International Statistical Review, 87, 192–218.
Chen, S., Haziza, D., Léger, C., & Mashreghi, Z. (2019). Pseudo-population bootstrap methods for imputed survey data. Biometrika, 106, 369–384.
Da Silva, D. N., & Zhang, L.-C. (2019). A calibrated imputation method for secondary data analysis of survey data. Scandinavian Journal of Statistics (to appear).
Deville, J. C. (1999). Variance estimation for complex statistics and estimators: Linearization and residual techniques. Survey Methodology, 25, 193–204.
Deville, J.-C., & Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376–382.
Deville, J.-C., & Särndal, C.-E. (1994). Variance estimation for the regression imputed Horvitz-Thompson estimator. Journal of Official Statistics, 23, 33–40.
Escobar, E. L., & Berger, Y. G. (2013). A jackknife variance estimator for self-weighted two-stage samples. Statistica Sinica, 23, 595–613.
Fattorini, L. (2006). Applying the Horvitz-Thompson criterion in complex designs: A computer-intensive perspective for estimating inclusion probabilities. Biometrika, 93, 269–278.
Fay, R. E. (1991). A design-based perspective on missing data variance. In: Proceedings of the 1991 Annual Research Conference, pp. 429–440. U.S. Census Bureau.
Gross, S. (1980). Median estimation in sample surveys. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 181–184.
Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35, 1491–1523.
Hájek, J. (1981). Sampling from a finite population (Vol. 37). New York: Marcel Dekker.
Hartley, H., & Rao, J.N.K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 33, 350–374.
Haziza, D. (2009). Imputation and inference in the presence of missing data. In D. Pfeffermann & C. R. Rao (Eds.), Sample surveys: Design, methods and applications (pp. 215–246). Amsterdam: Elsevier.
Haziza, D., & Beaumont, J.-F. (2017). Construction of weights in surveys: A review. Statistical Science, 32, 206–226.
Haziza, D., Mecatti, F., & Rao, J.N.K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron International Journal of Statistics, 66, 91–108.
Haziza, D., & Picard, F. (2012). Doubly robust point and variance estimation in the presence of imputed survey data. Canadian Journal of Statistics, 40, 259–281.
Haziza, D., & Rao, J. N. K. (2010). Variance estimation in two-stage cluster sampling under imputation for missing data. Journal of Statistical Theory and Practice, 4, 827–844.
Henderson, T. (2006). Estimating the variance of the Horvitz-Thompson estimator. Bachelor’s thesis, School of Finance and Applied Statistics, The Australian National University.
Holmberg, A. (1998). A bootstrap approach to probability proportional to size sampling. In: Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 378–383.
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47, 663–685.
Kim, J. K., & Rao, J. N. K. (2009). A unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika, 96, 917–932.
Lee, H., Rancourt, E., & Särndal, C.E. (2002). Variance estimation from survey data under single imputation. In R. M. Groves, D. A. Dillman, J. L. Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 315–328). New York: Wiley.
Mashreghi, Z., Haziza, D., & Léger, C. (2016). A survey of bootstrap methods in finite population sampling. Statistics Surveys, 10, 1–52.
Mashreghi, Z., Léger, C., & Haziza, D. (2014). Bootstrap methods for imputed data from regression, ratio and hot-deck imputation. Canadian Journal of Statistics, 42, 142–167.
Matei, A., & Tillé, Y. (2005). Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics, 21, 543–570.
Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9, 538–558.
Rao, J. N. K. (1965). On two simple schemes of unequal probability sampling without replacement. Journal of the Indian Statistical Association, 3, 173–180.
Rao, J. N. K., & Shao, J. (1992). Jackknife variance estimation with survey data under hot deck imputation. Biometrika, 79, 811–822.
Rao, J. N. K., & Wu, C. (1988). Resampling inference with complex survey data. Journal of the American Statistical Association, 83, 231–241.
Rao, J. N. K., Wu, C., & Yue, K. (1992). Some recent work on resampling methods for complex surveys. Survey Methodology, 18, 209–217.
Rosén, B. (1991). Variance estimation for systematic pps-sampling. Report 1991: 15. Statistics Sweden.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley, New York
Sampford, M. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 54, 499–513.
Särndal, C.-E. (1992). Methods for estimating the precision of survey estimates when imputation has been used. Survey Methodology, 18, 241–252.
Särndal, C.-E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. Berlin: Springer.
Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.
Shao, J. (2000). Cold deck and ratio imputation. Survey Methodology, 26, 79–86.
Shao, J. (2002). Replication methods for variance estimation in complex surveys with imputed data. In R. M. Groves, D. A. Dillman, J. L. Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 303–314). New York: Wiley.
Shao, J., & Sitter, R. R. (1996). Bootstrap for imputed survey data. Journal of the American Statistical Association, 91, 1278–1288.
Shao, J., & Steel, P. (1999). Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94, 254–265.
Sitter, R. R. (1992). A resampling procedure for complex survey data. Journal of the American Statistical Association, 87, 755–765.
Thompson, M. E., & Wu, C. (2008). Simulation-based randomized systematic pps sampling under substitution of units. Survey Methodology, 34, 3–10.
Vallée, A.-A. (2014). Approximation de la variance en présence de données imputées pour des plans de sondage à grande entropie. Master’s thesis, Université de Montréal.
Vallée, A.-A., & Tillé, Y. (2019). Linearisation for variance estimation by means of sampling indicators: Application to non-response. International Statistical Review, 87, 347–367.
Wolter, K. (2007). Introduction to variance estimation. Springer, Berlin
Yates, F., & Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society: Series B, 15, 253–261.
Yung, W. and Rao, J. N. K. (2000). Jackknife variance estimation under imputation for estimators using poststratification information. Journal of the American Statistical Association, 95, 903–915.