Suy diễn nguyên nhân mà không cần kiểm tra sự cân bằng: Kỹ thuật đối sánh chính xác thô
Tóm tắt
Chúng tôi thảo luận về một phương pháp cải thiện suy diễn nguyên nhân được gọi là "Đối sánh chính xác thô" (CEM), và lớp phương pháp đối sánh mới "Giới hạn mất cân bằng đơn điệu" (MIB) từ đó CEM được suy ra. Chúng tôi tóm tắt những gì đã biết về CEM và MIB, suy diễn và minh họa một số tính chất thống kê mới mong muốn của CEM, và sau đó đề xuất nhiều mở rộng hữu ích. Chúng tôi cho thấy CEM sở hữu một loạt các tính chất thống kê mà hầu hết các phương pháp đối sánh khác không có, nhưng cùng lúc đó lại cực kỳ dễ dàng để hiểu và sử dụng. Chúng tôi tập trung vào mối liên hệ giữa các tính chất lý thuyết và ứng dụng thực tế. Chúng tôi cũng cung cấp phần mềm mã nguồn mở dễ sử dụng cho
Từ khóa
Tài liệu tham khảo
Iacus Stefano M. , King Gary , and Porro Giuseppe . 2011. Multivariate matching methods that are Monotonic Imbalance Bounding. Journal of the American Statistical Association. http://gking.harvard.edu/files/abs/cem-math-abs.shtml.
King, 2001, Analyzing incomplete political science data: An alternative algorithm for multiple imputation, American Political Science Review, 95, 49, 10.1017/S0003055401000235
Although this initial choice poses all the usual issues and potential problems when choosing bins in drawing histograms, we use it only as a fixed reference to evaluate pre- and postmatching imbalance. Moreover, in practice, we use Iacus, King, and Porro's (2011) suggestion of a fixed bin width, computed by the median of all possible bin widths computed from the raw data.
Freedman, 1981, On the histogram as a density estimator: L2 theory, Probability Theory and Related Fields, 57
As Rubin (2006) writes, “First, since it is generally not wise to obtain a very precise estimate of a drastically wrong quantity, the investigator should be more concerned about having an estimate with small bias than one with small variance. Second, since in many observational studies the sample sizes are sufficiently large that sampling variances of estimators will be small, the sensitivity of estimators to biases is the dominant source of uncertainty.” Causal Inference without Balance Checking
Galdo Jose , Smith Jeffrey , and Black Dan . 2008. Bandwidth selection and the estimation of treatment effects with unbalanced data. Working paper, University of Michigan.
Combined with shifted coarsenings, an exhaustive procedure with greater than triplets is feasible only via parallel processing, which happens to be easy to implement with CEM. In practice, however, there no need to explore all these combinations of different coarsenings because even the basic application of CEM clearly reveals which data are well matched overall and also with respect to how the treated and control units differ in the multidimensional distribution. When we use this algorithm, we usually relax only one or two variables at a time.
Lalonde, 1986, Evaluating the econometric evaluations of training programs, American Economic Review, 76
Battistin Erich , and Chesher Andrew . 2004. The impact of measurement error on evaluation methods based on strong ignorability. Working paper, Institute for Fiscal Studies, London.
Cochran, 1973, Controlling bias in observational studies: A review, Sankhya: The Indian Journal of Statistics, Series A, 35
Iacus Stefano M. , King Gary , and Porro Giuseppe . 2011b. Replication data for: Causal inference without balance checking: Coarsened Exact Matching. Murray Research Archive [distributor] V1 [version]. http://hdl.handle.net/1902.1/15601.
Iacus, 2009, Random recursive partitioning: A matching method for the estimation of the average treatment effect, Journal of Applied Econometrics, 24
King Gary , Nielsen Richard , Coberley Carter , Pope James , and Wells Aaron . 2011. Comparative effectiveness of matching methods for causal inference.
Manski, 1995, Identification problems in the social sciences
Diamond Alexis , and Sekhon Jasjeet . 2005. Genetic matching for estimating causal effects: A new method of achieving balance in observational studies. Working paper, http://jsekhon.fas.harvard.edu/ (accessed 2005).
Imbens, 2003, Sensitivity to exogeneity assumptions in program evaluation, American Economic Review, 96
To illustrate, suppose we run optimal or nearest neighbor matching on the Mahalanobis or propensity score distance with a fixed number of matched control units, mC . The result would be some level of average imbalance for each variable. If we use this imbalance to define ε j and apply CEM, we would usually obtain a similar number for mC as set ex ante. Similarly, consider a method in the equal percent bias reducting class of methods and its associated data requirements, and run it given some fixed number of control units mC . Assume the maximum imbalance can be computed explicitly (Rubin 1976, Equation 2.2), and define γ as one minus this maximum imbalance. In most situations, we would expect that running CEM would produce a similar number of control units as fixed ex ante by this existing method.
Abadie Alberto , and Imbens Guido W. 2007. Bias-corrected matching estimators for average treatment effects. Unpublished manuscript. http://ksghome.harvard.edu/aabadie/research.html.