Cân bằng Entropy cho Các Hiệu ứng Nguyên nhân: Phương pháp Tái trọng số Đa biến để Tạo mẫu Cân bằng trong Các Nghiên cứu Quan sát

Political Analysis - Tập 20 Số 1 - Trang 25-46 - 2012
Jens Hainmueller1
1Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139. e-mail:

Tóm tắt

Bài báo này đề xuất phương pháp cân bằng entropy, một phương pháp tiền xử lý dữ liệu nhằm đạt được sự cân bằng biến trong các nghiên cứu quan sát với các điều trị nhị phân. Cân bằng entropy phụ thuộc vào một sơ đồ tái trọng số tối đa entropy, điều chỉnh trọng số của từng đơn vị sao cho nhóm điều trị và nhóm đối chứng đã được tái trọng số thỏa mãn một tập hợp lớn các điều kiện cân bằng đã được xác định trước, trong đó tích hợp thông tin về các khoảnh khắc mẫu đã biết. Cân bằng entropy do đó điều chỉnh chính xác sự bất bình đẳng trong đại diện liên quan đến các khoảnh khắc thứ nhất, thứ hai và có thể là các khoảnh khắc cao hơn của các phân phối biến. Những cải tiến về cân bằng này có thể giảm phụ thuộc vào mô hình cho ước lượng hiệu ứng điều trị tiếp theo. Phương pháp này đảm bảo rằng sự cân bằng cải thiện với tất cả các khoảnh khắc biến được đưa vào tái trọng số. Nó cũng loại bỏ nhu cầu kiểm tra cân bằng liên tục và tìm kiếm lặp đi lặp lại trên các mô hình điểm số xác suất có thể làm cân bằng một cách ngẫu nhiên các khoảnh khắc biến. Chúng tôi chứng minh việc sử dụng cân bằng entropy qua các mô phỏng Monte Carlo và các ứng dụng thực nghiệm.

Từ khóa


Tài liệu tham khảo

10.1037/1082-989X.9.4.403

10.1111/j.1467-985X.2007.00527.x

10.1093/pan/mpl013

Formally propensity score reweighting exploits the following equalities: which uses the ignorability assumption in the second to last equality (Hirano and Imbens 2001; Hirano, Imbens, and Ridder 2003).

Graham B. S. , Pinto C. , and Egel D. 2010. Inverse probability tilting for moment condition models with missing data. Working paper. New York University.

Erlander S. 1977. Entropy in linear programs—an approach to planning. Report No. LiTH-MAT-R-77-3. Department of Mathematics, Linköping University, Sweden.

Diamond A. J. , and Sekhon J. 2006. Genetic matching for causal effects: A general multivariate matching method for achieving balance in observational studies. Unpublished manuscript, Department of Political Science, UC Berkeley.

10.1201/9781420036152

10.1162/qjec.122.3.1187

Erlander, 2004, Finite sample properties of propensity-score matching and weighting estimators, Review of Economics and Statistics, 86, 77, 10.1162/003465304323023697

10.2307/2971718

Abadie A. , and Imbens G. 2007. Simple and bias-corrected matching estimators for average treatment effects. Working paper. Harvard University.

Notice that we exclude nonsensical interactions such as for example between high school degree and years of schooling. We also omit squared terms for pretreatment earnings and their interaction because due to their collinearity they are simply balanced by adjusting on the lower order terms. For example, their T-test p values in the reweighted data are .76, .83, .99, respectively.

Notice that we focus on the Dehejia and Wahba subset of the LaLonde data.

Mattos R. , and Veiga A. 2004. Entropy optimization: Computer implementation of the maxent and minexent principles. Working paper. Universidade Federal de Juiz de Fora, Brazil.

Evidently, the inequality bounds wi ≥ 0 are inactive and can be safely ignored.

Also see Kapur and Kevsavan (1992) or Mattos and Veiga (2004) for detailed treatments and similar algorithms for entropy optimization.

10.1017/S0003055409990190

For example, the treatment group can be reweighted to match the control group. An important caveat is that it may be more difficult to estimate the PATE or SATE due to limited overlap in the covariate distributions.

Note that some other causal quantities of interest are not defined in this way (e.g., causal mediation or necessary causation).

10.1162/003465399557860

This web appendix is available on the authors webpage at http://www.mit.edu/jhainm/research.htm.

In practice, the weights may sometimes differ from zero or one in the case of ties or for controls units that are matched several times when matching with replacement. Entropy Balancing for Causal Effects

10.1162/003465304323023705

10.1111/j.1368-423X.2007.00212.x

10.1111/j.1540-5907.2009.00377.x

10.2307/2532266

10.1162/003465304323023651

10.2307/2171942

10.1023/A:1020371312283

Iacus, 2009, Causal inference without balance checking: Coarsened exact matching

Notice that variables that are labeled as “prior” are measured in the 1992–1996 survey waves. See the authors' web appendix for a detailed explanation of the variable definitions.

In the online appendix, we show another example where weighting on a logistic propensity score that is estimated without any. squared terms leads to a strong decrease in balance over the raw data for many covariates (see Figs. 5 and 6 in the online appendix).

10.1214/009053606000001208

Notice that there is some debate about how to assess covariate balance in practice. Theoretically, we would like the two empirical distributions to be equal so that the density in the preprocessed control group mirrors the density in the treatment group fX|D= 1. Comparing the joint empirical distributions of all covariates X is difficult when X is high dimensional and therefore lower dimensional balance metrics are commonly used (but see Iacus, King, and Porro 2009 who propose a multidimensional metric). Opinions differ on what metric is most appropriate. The most commonly used metric is the standardized difference in means (Rosenbaum and Rubin 1983) and /-tests for differences in means. Diamond and Sekhon (2006) argue that paired t-test and bootstrapped Kolmogorov-Smirnov

(KS) tests should be used instead and that commonly used p value cutoffs such as .1 or .05 are too lenient to obtain reliable causal inferences. Rubin (2006) also considers variance ratios and tests for residuals that are orthogonalized to the propensity score. Imai, King, and Stuart (2008) criticize the use of t-tests and stopping rules and argue that all balance measures should be maximized without limit. They advocate QQ plot summary statistics as better alternatives than t-tests or KS tests. Sekhon (2006) comes to the opposite conclusion. Hansen and Bowers (2008) advocate the use of Fisher's randomization inference for balance checking.

10.2307/2998560

There is a growing literature that uses simulation to assess the properties of matching procedures (partially reviewed in Imbens 2004). Frölich (2004) presents an extensive simulation study that considers various matching methods across a wide variety of sample designs, but his study is limited to a single covariate and true propensity scores. Zhao (2004) investigates the finite sample properties of pair matching and propensity score matching and finds no clear winner among these techniques. Although including different sample sizes, his study does not vary the controls to treated ratio and is also limited to true propensity scores. Brookhart et al. (2006) simulate the effect of including or excluding irrelevant variables in propensity score matching. Abadie and Imbens (2007) present a matching simulation using data from the Panel Study of Income Dynamics data and find that their bias corrected matching estimator outperforms linear regression adjustment. Diamond and Sekhon (2006) provide two Monte Carlo experiments, one with multivariate normal data and three covariates and a second using data from the Lalonde data set. They find that their genetic matching outperforms other matching techniques. Further simulations using multivariate normal data are presented in Gu and Rosenbaum (1993) and several of the papers collected in Rubin (2006). Drake (1993) finds that misspecified propensity scores often result in substantial bias in simulations with two normally distributed covariates.

10.1257/aer.91.2.112

Notice that there are over 4.5 quadrillion possible subsets of the 52 covariates so we cannot run all possible regressions.

10.1146/annurev.polisci.11.060606.135444

10.1093/biomet/55.1.179

There are exceptions to this rule (e.g., when calipers are used).

I am grateful to the authors for sharing their data.

Ellipsoidal symmetry fails if X includes binary, categorical, and or skewed continuous variables.

10.1080/01621459.1952.10483446

Hirano, Imbens, and Ridder (2003) derive their result for a case where the propensity score is estimated using a nonparametric sieve estimator that approximates the true propensity score by a power series in all variables. Asymptotically, this series will converge to the true propensity score function if the powers increase with the sample size, but no results exist about the finite sample properties of this estimator. By the authors' own admission, this approach is computationally not very attractive.

For example, the sum of the control weights could be normalized to equal the number of treated units; that is identical to setting the normalization constraint to n 1.

The CR divergence family is described by where γ indexes the family and limits are defined by continuity so that lim and lim where the last equalities follow from l'Hospital respectively. Notice that h(w) = w log(w) represents the Shannon entropy metric which is (up to a constant) equivalent to the Kullback entropy divergence when uniform weights qi are used for the null distribution. Another choice with good properties is γ = −1 which results in an empirical likelihood (EL) scheme. We prefer the entropy loss because it is more robust under misspecification (Imbens, Spady, and Johnson 1998; Schennach 2007) and constrains the weights to be non-negative.

10.1007/978-94-011-2430-0_1

Kullback, 1959, Information theory and statistics

10.2307/2998561

LaLonde, 1986, Evaluating the econometric evaluations of training programs with experimental data, American Economic Review, 76

10.1007/978-1-4612-4578-0

Oh H. L. , and Scheuren F. J. 1978. Multivariate ratio raking estimation in the 1973 exact match study. Proceedings of the Section on Survey Research Methods XXV: 716–22.

10.1214/aos/1176325370

10.1198/jasa.2009.tm08163

Robins, 1995, Analysis of semiparametric regression models for repeated outcomes in the presence of missing data, Journal of the American Statistical Association, 90

10.1093/biomet/70.1.41

10.1017/CBO9780511810725

Särndal, 2006, Estimation in surveys with nonresponse

10.1214/08-STS254

Sekhon J. 2006. Alternative balance metrics for bias reduction in matching methods for causal inference. Unpublished manuscript, Department of of Political Science, UC Berkeley.

Zaslavsky, 1988, Representing local reweighting area adjustments by of households, Survey Methodology, 14

10.2307/1912775

Notice that we use p values as a measure of balance, and not to conduct hypothesis tests in the conventional sense (see Imai, King, and Stuart 2008).

10.1214/aoms/1177731829

Gu, 1993, Comparison of multivariate matching methods: Structures, distances, and algorithms, Journal of Computational and Graphical Statistics, 2

10.1111/1468-0262.00442

10.1093/aje/kwj149