Suy diễn nguyên nhân mà không cần kiểm tra sự cân bằng: Kỹ thuật đối sánh chính xác thô

Political Analysis - Tập 20 Số 1 - Trang 1-24 - 2012
Stefano M. Iacus1, Gary King2, Giuseppe Porro3
1Department of Economics, Business and Statistics, University of Milan, Via Conservatorio 7, I-20124 Milan, Italy e-mail:
2Institute for Quantitative Social Science, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138
3Department of Economics and Statistics, University of Trieste, P.le Europa 1, I-34127 Trieste, Italy e-mail:

Tóm tắt

Chúng tôi thảo luận về một phương pháp cải thiện suy diễn nguyên nhân được gọi là "Đối sánh chính xác thô" (CEM), và lớp phương pháp đối sánh mới "Giới hạn mất cân bằng đơn điệu" (MIB) từ đó CEM được suy ra. Chúng tôi tóm tắt những gì đã biết về CEM và MIB, suy diễn và minh họa một số tính chất thống kê mới mong muốn của CEM, và sau đó đề xuất nhiều mở rộng hữu ích. Chúng tôi cho thấy CEM sở hữu một loạt các tính chất thống kê mà hầu hết các phương pháp đối sánh khác không có, nhưng cùng lúc đó lại cực kỳ dễ dàng để hiểu và sử dụng. Chúng tôi tập trung vào mối liên hệ giữa các tính chất lý thuyết và ứng dụng thực tế. Chúng tôi cũng cung cấp phần mềm mã nguồn mở dễ sử dụng cho R, StataSPSS thực hiện tất cả các đề xuất của chúng tôi.

Từ khóa


Tài liệu tham khảo

10.2307/2971733

Iacus Stefano M. , King Gary , and Porro Giuseppe . 2011. Multivariate matching methods that are Monotonic Imbalance Bounding. Journal of the American Statistical Association. http://gking.harvard.edu/files/abs/cem-math-abs.shtml.

King, 2001, Analyzing incomplete political science data: An alternative algorithm for multiple imputation, American Political Science Review, 95, 49, 10.1017/S0003055401000235

10.1257/aer.98.1.311

10.1093/biomet/87.3.706

10.1162/neco.2007.19.6.1503

Although this initial choice poses all the usual issues and potential problems when choosing bins in drawing histograms, we use it only as a fixed reference to evaluate pre- and postmatching imbalance. Moreover, in practice, we use Iacus, King, and Porro's (2011) suggestion of a fixed bin width, computed by the median of all possible bin widths computed from the raw data.

10.18637/jss.v030.i09

10.18637/jss.v025.i11

Freedman, 1981, On the histogram as a density estimator: L2 theory, Probability Theory and Related Fields, 57

10.1093/biomet/63.3.581

As Rubin (2006) writes, “First, since it is generally not wise to obtain a very precise estimate of a drastically wrong quantity, the investigator should be more concerned about having an estimate with small bias than one with small variance. Second, since in many observational studies the sample sizes are sufficiently large that sampling variances of estimators will be small, the sensitivity of estimators to biases is the dominant source of uncertainty.” Causal Inference without Balance Checking

Galdo Jose , Smith Jeffrey , and Black Dan . 2008. Bandwidth selection and the estimation of treatment effects with unbalanced data. Working paper, University of Michigan.

Combined with shifted coarsenings, an exhaustive procedure with greater than triplets is feasible only via parallel processing, which happens to be easy to implement with CEM. In practice, however, there no need to explore all these combinations of different coarsenings because even the basic application of CEM clearly reveals which data are well matched overall and also with respect to how the treated and control units differ in the multidimensional distribution. When we use this algorithm, we usually relax only one or two variables at a time.

10.1093/biomet/asn004

10.1093/pan/mpj004

Lalonde, 1986, Evaluating the econometric evaluations of training programs, American Economic Review, 76

10.1016/j.jeconom.2004.04.011

Battistin Erich , and Chesher Andrew . 2004. The impact of measurement error on evaluation methods based on strong ignorability. Working paper, Institute for Fiscal Studies, London.

Cochran, 1973, Controlling bias in observational studies: A review, Sankhya: The Indian Journal of Statistics, Series A, 35

10.1016/j.csda.2006.12.036

10.1093/biomet/asn055

10.1111/j.1468-2478.2007.00445.x

10.2307/3088394

10.1198/016214506000001059

10.1257/000282803321455188

10.1093/pan/mpl013

Iacus Stefano M. , King Gary , and Porro Giuseppe . 2011b. Replication data for: Causal inference without balance checking: Coarsened Exact Matching. Murray Research Archive [distributor] V1 [version]. http://hdl.handle.net/1902.1/15601.

Iacus, 2009, Random recursive partitioning: A matching method for the estimation of the average treatment effect, Journal of Applied Econometrics, 24

10.1198/016214504000001187

10.1111/j.1467-985X.2007.00527.x

King Gary , Nielsen Richard , Coberley Carter , Pope James , and Wells Aaron . 2011. Comparative effectiveness of matching methods for causal inference.

Manski, 1995, Identification problems in the social sciences

10.1023/A:1020363010465

Diamond Alexis , and Sekhon Jasjeet . 2005. Genetic matching for estimating causal effects: A new method of achieving balance in observational studies. Working paper, http://jsekhon.fas.harvard.edu/ (accessed 2005).

10.2307/2951620

Imbens, 2003, Sensitivity to exogeneity assumptions in program evaluation, American Economic Review, 96

10.1017/CBO9780511804564

10.1002/9780470316849

Mielke, 2007, Permutation methods: A distance function approach, 10.1007/978-0-387-69813-7

10.1080/01621459.1999.10473858

10.1002/9780470316696

Girosi, 2008, Demographic forecasting, 10.1515/9780691186788

10.1198/016214501753381896

To illustrate, suppose we run optimal or nearest neighbor matching on the Mahalanobis or propensity score distance with a fixed number of matched control units, mC . The result would be some level of average imbalance for each variable. If we use this imbalance to define ε j and apply CEM, we would usually obtain a similar number for mC as set ex ante. Similarly, consider a method in the equal percent bias reducting class of methods and its associated data requirements, and run it given some fixed number of control units mC . Assume the maximum imbalance can be computed explicitly (Rubin 1976, Equation 2.2), and define γ as one minus this maximum imbalance. In most situations, we would expect that running CEM would produce a similar number of control units as fixed ex ante by this existing method.

10.1214/08-STS274

10.1162/003465302317331982

10.1162/003465304323023651

10.1017/CBO9780511810725

Abadie Alberto , and Imbens Guido W. 2007. Bias-corrected matching estimators for average treatment effects. Unpublished manuscript. http://ksghome.harvard.edu/aabadie/research.html.

10.1002/sim.2328