Symmetric and asymmetric rounding: a review and some new results
Tóm tắt
Using rounded data to estimate moments and regression coefficients typically biases the estimates. We explore the bias-inducing effects of rounding, thereby reviewing widely dispersed and often half forgotten results in the literature. Under appropriate conditions, these effects can be approximately rectified by versions of Sheppard’s correction formula. We discuss the conditions under which these approximations are valid and also investigate the efficiency loss caused by rounding. The rounding error, which corresponds to the measurement error of a measurement error model, has a marginal distribution, which can be approximated by the uniform distribution, but is not independent of the true value. In order to take account of rounding preferences (heaping), we generalize the concept of simple rounding to that of asymmetric rounding and consider its effect on the mean and variance of a distribution.
Tài liệu tham khảo
Augustin, T., Wolff, J.: A bias analysis of Weibull models under heaped data. Stat. Pap. 45, 211–229 (2004)
Baten, W.D.: Correction for the moments of a frequency distribution in two variables. Ann. Math. Stat. 2, 309–312 (1931)
Braun, J., Duchesne, T., Stafford, J.E.: Local likelihood density estimation for interval censored data. Can. J. Stat. 33, 39–59 (2005)
Crockett, A., Crockett, R.: Consequences of data heaping in the British religious census of 1851. Hist. Methods 39, 24–47 (2006)
Daniels, H.E.: Grouping correction for high autocorrelations. J. R. Stat. Soc. B 9, 245–249 (1947)
Dempster, A.P., Rubin, D.B.: Rounding error in regression: the appropriateness of Sheppard’s correction. J. R. Stat. Soc. B 45, 51–59 (1983)
Don, F.J.H.: A note on Sheppard’s corrections for grouping and maximum likelihood estimation. J. Multivariate Anal. 11, 452–458 (1981)
Eisenhart, C.: Effects of rounding or grouping data. In: Eisenhart, C., Hastay, M.W., Wallis, W.A. (eds.) Selected Techniques of Statistical Analysis, pp. 185–223. McGraw-Hill, New York/London (1947). Chapter 4
Fryer, J.G., Pethybridge, R.J.: Maximum likelihood estimation of a linear regression function with grouped data. Appl. Stat. 21, 142–154 (1972)
Gjeddebaek, N.F.: Contribution to the study of grouped observations: I. Application of the method of maximum likelihood in case of normally distributed observations. Skand. Aktuarietidskrift 32, 135–159 (1949)
Gjeddebaek, N.F.: Contribution to the study of grouped observations: II. Loss of information caused by groupings of normally distributed observations. Skand. Aktuarietidskrift 39, 154–159 (1956)
Gjeddebaek, N.F.: Statistical analysis: III. Grouped observations. In: Sills, D.R. (ed.): International Encyclopedia of Social Sciences, vol. 15, pp. 193–196. Macmillan/Free Press, New York (1968)
Gray, R.M., Neuhoff, D.L.: Quantization. IEEE Trans. Inf. Theory 44, 1–63 (1998)
Haitovsky, Y.: In: Grouped Data. Encyclopedia of Statistical Sciences, vol. 3, pp. 527–536. Wiley, New York (1982)
Hall, P.: The influence of rounding errors on some nonparametric estimators of a density and its derivatives. SIAM J. Appl. Math. 42, 390–399 (1982)
Heitjan, D.F.: Inference from grouped continuous data: a review. Stat. Sci. 4, 164–179 (1989)
Heitjan, D.F., Rubin, D.B.: Ignorability and coarse data. Ann. Stat. 19, 2244–2253 (1991)
Janson, S.: Rounding of continuous random variables and oscillatory asymptotics. Ann. Probab. 34, 1807–1826 (2006)
Johnson, D.S., Barry, R.P., Bowyer, R.T.: Estimating timing of life-history events with coarse data. J. Mammal. 85, 932–939 (2004)
Kendall, M.G.: The conditions under which Sheppard’s corrections are valid. J. R. Stat. Soc. 101, 592–605 (1938)
Komlos, J.: On the nature of the Malthusian threat in the eighteenth century. Econ. Hist. Rev. 52, 730–748 (1999)
Kullback, S.: A note on Sheppard’s corrections. Ann. Math. Stat. 6, 158–159 (1935)
Kulldorff, G.: Contributions to the Theory of Estimation from Grouped and Partially Grouped Samples. Almqvist and Wiksell, Stockholm (1961)
Lambert, P., Eilers, P.H.C.: Bayesian density estimation from grouped continuous data. Comput. Stat. Data Anal. 53, 1388–1399 (2009)
Lee, C.-S., Vardeman, S.B.: Interval estimation of a normal process mean from rounded data. J. Qual. Technol. 33, 335–348 (2001)
Lee, C.-S., Vardeman, S.B.: Interval estimation of a normal process standard deviation from rounded data. Commun. Stat., Part B: Simul. Comput. 31, 13–34 (2002)
Lindley, D.V.: Grouping corrections and maximum likelihood equations. Proc. Camb. Philos. Soc. 46, 106–110 (1950)
Liu, T.Q., Zhang, B.X., Hu, G.R., Bai, Z.D.: Revisit of Sheppard corrections in linear regression. RMI Working Paper 07/06, Berkeley-NSU (2007)
McNeil, D.R.: Consistent statistics for estimating and testing hypotheses from grouped samples. Biometrika 53, 545–557 (1966)
Müller, S.: Zuverlässige statistische Modellierung bei gerundeten Daten. Diplomarbeit. Department of Statistics, Ludwig-Maximilian University Munich (2008)
Myers, R.J.: Accuracy of age reporting in the 1950 United States census. J. Am. Stat. Assoc. 49, 826–831 (1954)
Pairman, E., Pearson, K.: On correcting for the moment-coefficients of limited range frequency-distributions when there are finite or infinite ordinates and any slopes at the terminals of range. Biometrika 12, 231–258 (1919)
Rietveld, P.: Rounding of arrival and departure times in travel surveys: an interpretation in terms of scheduled activities. J. Transp. Stat. 5, 71–82 (2002)
Schader, M., Schmid, F.: Computation of maximum likelihood estimates for μ and σ from a grouped sample of a normal population. A comparison of algorithms. Stat. Pap. 25, 245–258 (1984)
Schneeweiss, H., Komlos, J.: Probabilistic rounding and Sheppard’s correction. Stat. Methodol. 6, 577–593 (2009)
Schneeweiss, H., Komlos, J., Ahmad, A.S.: Symmetric and asymmetric rounding. Discussion Paper 479, Sonderforschungsbereich 386, University of Munich (2006)
Sheppard, W.F.: On the calculation of the most probable values of frequency constants for data arranged according to equidistant divisions of a scale. Proc. Lond. Math. Soc. 29, 353–380 (1898)
Stuart, A., Ord, J.K.: Kendall’s Advanced Theory of Statistics. Distribution Theory, vol. 1, 5th edn. Charles Griffin, London (1987)
Stoer, J., Bulirsch, R.: Introduction to Numerical Analysis. Springer, New York (1980)
Tallis, G.M.: Approximate maximum likelihood estimation from grouped data. Technometrics 9, 599–606 (1967)
Tallis, G.M., Young, S.S.: Maximum likelihood estimation of parameters of the normal, log-normal, truncated normal and bivariate normal distributions from grouped data. Aust. J. Stat. 4, 49–54 (1962)
Tricker, A.R.: Estimation of rounding data sampled from the exponential distribution. J. Appl. Stat. 11, 51–87 (1984a)
Tricker, A.R.: Effects of rounding on the moments of a probability distribution. Statistician 33, 381–390 (1984b)
Tricker, A.R.: The effect of rounding on the significance level of certain normal test statistics. J. Appl. Stat. 17, 31–38 (1990a)
Tricker, A.R.: The effect of rounding on the power level of certain normal test statistics. J. Appl. Stat. 17, 219–227 (1990b)
Tricker, A.R.: Estimation of parameters for rounded data from non-normal distributions. J. Appl. Stat. 19, 465–471 (1992)
Vardeman, S.B.: Sheppard’s correction for variances and the “Quantization Noise Model”. IEEE Trans. Instrum. Meas. 54, 2117–2119 (2005)
Vardeman, S.B., Lee, C.-S.: Likelihood-based statistical estimation from quantization data. IEEE Trans. Instrum. Meas. 54, 409–414 (2005)
Wang, H., Heitjan, D.F.: Modeling heaping in self-reported cigarette counts. Stat. Med. 27, 3789–3804 (2008)
Widrow, B., Kollar, I., Liu, M.-C.: Statistical theory of quantization. IEEE Trans. Instrum. Meas. 45, 353–361 (1996)
Wilrich, P.Th.: Rounding of measurement values or derived values. Measurement 37, 21–30 (2005)
Wimmer, G., Witowsky, V., Duby, T.: Proper rounding for the measurement results under the assumption of uniform distribution. Meas. Sci. Technol. 11, 1659–1665 (2000)
Wold, H.: Sheppard’s correction formulae in several variables. Skand. Aktuarietidskrift 17, 248–255 (1934)
Wolff, J., Augustin, T.: Heaping and its consequences for duration analysis: a simulation study. Allgemeines Stat. Arch. 87, 59–86 (2003)
Wright, D.E., Bray, I.: A mixture model for rounded data. Statistician 52, 3–13 (2003)
Wu, J.: How severe was the Great Depression? Evidence from the Pittsburgh region. In: Komlos, J. (ed.) Stature, Living Standards, and Economic Development: Essays in Anthropometric History, pp. 129–152. University of Chicago Press, Chicago (1994)
Zhang, B.X., Liu, T.Q., Bai, Z.D.: Analysis of rounded data from dependent sequences. Ann. Inst. Stat. Math. (2010, to appear)