Matching Methods for Causal Inference: A Review and a Look Forward

Statistical Science - Tập 25 Số 1 - 2010
Elizabeth A. Stuart1
1Departments of Mental Health and Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. <i>J. Amer. Statist. Assoc.</i> <b>99</b> 609–618.

Rosenbaum, P. R. (1984). The consequences of adjustment for a concomitant variable that has been affected by the treatment. <i>J. Roy. Statist. Soc. Ser. A</i> <b>147</b> 656–666.

Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies. <i>J. Roy. Statist. Soc. Ser. B</i> <b>53</b> 597–610.

Rubin, D. B. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. <i>Stat. Med.</i> <b>26</b> 20–36.

Lu, B., Zanutto, E., Hornik, R. and Rosenbaum, P. R. (2001). Matching with doses in an observational study of a media campaign against drug abuse. <i>J. Amer. Statist. Assoc.</i> <b>96</b> 1245–1253.

Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate?: Causal inference in the face of interference. <i>J. Amer. Statist. Assoc.</i> <b>101</b> 1398–1407.

Ho, D. E., Imai, K., King, G. and Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. <i>Political Analysis</i> <b>15</b> 199–236.

Hong, G. and Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. <i>J. Amer. Statist. Assoc.</i> <b>101</b> 901–910.

Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. <i>J. Amer. Statist. Assoc.</i> <b>103</b> 832–842.

Imai, K. and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalizing the propensity score. <i>J. Amer. Statist. Assoc.</i> <b>99</b> 854–866.

Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. <i>Biometrics</i> <b>58</b> 21–29.

Rubin, D. B. and Thomas, N. (2000). Combining propensity score matching with additional adjustments for prognostic covariates. <i>J. Amer. Statist. Assoc.</i> <b>95</b> 573–585.

Rosenbaum, P. R., Ross, R. N. and Silber, J. H. (2007). Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. <i>J. Amer. Statist. Assoc.</i> <b>102</b> 75–83.

Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. <i>J. Amer. Statist. Assoc.</i> <b>74</b> 318–328.

Rosenbaum, P. R. and Rubin, D. B. (1983b). The central role of the propensity score in observational studies for causal effects. <i>Biometrika</i> <b>70</b> 41–55.

Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. <i>Epidemiology</i> <b>11</b> 550–560.

Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. <i>Econometrica</i> <b>74</b> 235–267.

Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. <i>Econometrica</i> <b>71</b> 1161–1189.

Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. <i>J. Amer. Statist. Assoc.</i> <b>79</b> 516–524.

Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. <i>Biometrics</i> <b>61</b> 962–972.

Imbens, G. W. (2000). The role of the propensity score in estimating dose–response functions. <i>Biometrika</i> <b>87</b> 706–710.

Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. <i>Stat. Med.</i> <b>23</b> 2937–2960.

Qu, Y. and Lipkovich, I. (2009). Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach. <i>Stat. Med.</i> <b>28</b> 1402–1414.

Rosenbaum, P. R. and Rubin, D. B. (1983a). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. <i>J. Roy. Statist. Soc. Ser. B</i> <b>45</b> 212–218.

Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. <i>Biometrics</i> <b>24</b> 295–313.

Heckman, J. J., Ichimura, H. and Todd, P. (1998). Matching as an econometric evaluation estimator. <i>Rev. Econom. Stud.</i> <b>65</b> 261–294.

Holland, P. W. (1986). Statistics and causal inference. <i>J. Amer. Statist. Assoc.</i> <b>81</b> 945–960.

Rubin, D. B. (1980). Bias reduction using Mahalanobis metric matching. <i>Biometrics</i> <b>36</b> 293–298.

Hansen, B. B. (2008). The prognostic analogue of the propensity score. <i>Biometrika</i> <b>95</b> 481–488.

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. <i>Journal of Educational Psychology</i> <b>66</b> 688–701.

Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. <i>Review of Economics and Statistics</i> <b>86</b> 4–29.

Cochran, W. G. and Rubin, D. B. (1973). Controlling bias in observational studies: A review. <i>Sankhyā Ser. A</i> <b>35</b> 417–446.

Greevy, R., Lu, B., Silber, J. H. and Rosenbaum, P. (2004). Optimal multivariate matching before randomization. <i>Biostatistics</i> <b>5</b> 263–275.

Greenland, S., Robins, J. M. and Pearl, J. (1999). Confounding and collapsibility in causal inference. <i>Statist. Sci.</i> <b>14</b> 29–46.

Rubin, D. B. (1987). <i>Multiple Imputation for Nonresponse in Surveys</i>. Wiley, New York.

Rosenbaum, P. R. (2002). <i>Observational Studies</i>, 2nd ed. Springer, New York.

Rosenbaum, P. R. (2010). <i>Design of Observational Studies</i>. Springer, New York.

Abadie, A. and Imbens, G. W. (2009b). Matching on the estimated propensity score. Working Paper 15301, National Bureau of Economic Research, Cambridge, MA.

Augurzky, B. and Schmidt, C. (2001). The propensity score: A means to an end. Discussion Paper 271, Institute for the Study of Labor (IZA).

Chapin, F. (1947). <i>Experimental Designs in Sociological Research</i>. Harper, New York.

Cohen, J. (1988). <i>Statistical Power Analysis for the Behavioral Sciences</i>, 2nd ed. Earlbaum, Hillsdale, NJ.

Greenwood, E. (1945). <i>Experimental Sociology: A Study in Method</i>. King’s Crown Press, New York.

Harder, V. S., Stuart, E. A. and Anthony, J. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. <i>Psychological Methods</i>. To appear.

Hill, J., Reiter, J. and Zanutto, E. (2004). A comparison of experimental and observational data analyses. In <i>Applied Bayesian Modeling and Causal Inference From an Incomplete-Data Perspective</i> (A. Gelman and X.-L. Meng, eds.). Wiley, Hoboken, NJ.

Hill, J., Rubin, D. B. and Thomas, N. (1999). The design of the New York School Choice Scholarship Program evaluation. In <i>Research Designs: Inspired by the Work of Donald Campbell</i>, (L. Bickman, ed.) 155–180. Sage, Thousand Oaks, CA.

Potter, F. J. (1993). The effect of weight trimming on nonlinear survey estimates. In <i>Proceedings of the Section on Survey Research Methods of American Statistical Association</i>. Amer. Statist. Assoc., San Francisco, CA.

Rubin, D. B. (2006). <i>Matched Sampling for Causal Inference</i>. Cambridge Univ. Press, Cambridge.

Snedecor, G. W. and Cochran, W. G. (1980). <i>Statistical Methods</i>, 7th ed. Iowa State Univ. Press, Ames, IA.

Stuart, E. A. and Ialongo, N. S. (2009). Matching methods for selection of subjects for follow-up. <i>Multivariate Behavioral Research</i>. To appear.

Rosenbaum, P. R. (1999). Choice as an alternative to control in observational studies (with discussion). <i>Statist. Sci.</i> <b>14</b> 259–304.

Abadie, A. and Imbens, G. W. (2009a). Bias corrected matching estimators for average treatment effects. <i>Journal of Educational and Behavioral Statistics</i>. To appear. Available at <a href="http://www.hks.harvard.edu/fs/aabadie/bcm.pdf">http://www.hks.harvard.edu/fs/aabadie/bcm.pdf</a>.

Agodini, R. and Dynarski, M. (2004). Are experiments the only option? A look at dropout prevention programs. <i>Review of Economics and Statistics</i> <b>86</b> 180–194.

Althauser, R. and Rubin, D. (1970). The computerized construction of a matched sample. <i>American Journal of Sociology</i> <b>76</b> 325–346.

Austin, P. C. (2007). The performance of different propensity score methods for estimating marginal odds ratios. <i>Stat. Med.</i> <b>26</b> 3078–3094.

Austin, P. (2009). Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. <i>Comm. Statist. Simulation Comput.</i> <b>38</b> 1228–1234.

Austin, P. C. and Mamdani, M. M. (2006). A comparison of propensity score methods: A case-study illustrating the effectiveness of post-ami statin use. <i>Stat. Med.</i> <b>25</b> 2084–2106.

Brookhart, M. A., Schneeweiss, S., Rothman, K. J., Glynn, R. J., Avorn, J. and Sturmer, T. (2006). Variable selection for propensity score models. <i>American Journal of Epidemiology</i> <b>163</b> 1149–1156.

Carpenter, R. (1977). Matching when covariables are normally distributed. <i>Biometrika</i> <b>64</b> 299–307.

Cornfield, J. (1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. <i>Journal of the National Cancer Institute</i> <b>22</b> 173–200.

Crump, R., Hotz, V. J., Imbens, G. W. and Mitnik, O. (2009). Dealing with limited overlap in estimation of average treatment effects. <i>Biometrika</i> <b>96</b> 187–199.

Czajka, J. C., Hirabayashi, S., Little, R. and Rubin, D. B. (1992). Projecting from advance data using propensity modeling. <i>J. Bus. Econom. Statist.</i> <b>10</b> 117–131.

D’Agostino, Jr., R. B. and Rubin, D. B. (2000). Estimating and using propensity scores with partially missing data. <i>J. Amer. Statist. Assoc.</i> <b>95</b> 749–759.

Dehejia, R. H. and Wahba, S. (1999). Causal effects in nonexperimental studies: Re-evaluating the evaluation of training programs. <i>J. Amer. Statist. Assoc.</i> <b>94</b> 1053–1062.

Dehejia, R. H. and Wahba, S. (2002). Propensity score matching methods for non-experimental causal studies. <i>Review of Economics and Statistics</i> <b>84</b> 151–161.

Diamond, A. and Sekhon, J. S. (2006). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Working paper. Univ. California, Berkeley. Available at <a href="http://sekhon.berkeley.edu/papers/GenMatch.pdf">http://sekhon.berkeley.edu/papers/GenMatch.pdf</a>.

Drake, C. (1993). Effects of misspecification of the propensity score on estimators of treatment effects. <i>Biometrics</i> <b>49</b> 1231–1236.

Glazerman, S., Levy, D. M. and Myers, D. (2003). Nonexperimental versus experimental estimates of earnings impacts. <i>Annals of the American Academy of Political and Social Science</i> <b>589</b> 63–93.

Greenland, S. (2003). Quantifying biases in causal models: Classical confounding vs collider-stratification bias. <i>Epidemiology</i> <b>14</b> 300–306.

Greenland, S. and Finkle, W. D. (1995). A critical look at methods for handling missing covariates in epidemiologic regression analyses. <i>American Journal of Epidemiology</i> <b>142</b> 1255–1264.

Gu, X. and Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. <i>J. Comput. Graph. Statist.</i> <b>2</b> 405–420.

Hansen, B. B. (2008). The essential role of balance tests in propensity-matched observational studies: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine. <i>Stat. Med.</i> <b>27</b> 2050–2054.

Heckman, J. J., Hidehiko, H. and Todd, P. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. <i>Rev. Econom. Stud.</i> <b>64</b> 605–654.

Heckman, J. J., Ichimura, H., Smith, J. and Todd, P. (1998). Characterizing selection bias using experimental data. <i>Econometrica</i> <b>66</b> 1017–1098.

Heller, R., Rosenbaum, P. and Small, D. (2009). Split samples and design sensitivity in observational studies. <i>J. Amer. Statist. Assoc.</i> <b>104</b> 1090–1101.

Hill, J. L. and Reiter, J. P. (2006). Interval estimation for treatment effects using propensity score matching. <i>Stat. Med.</i> <b>25</b> 2230–2256.

Horvitz, D. and Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. <i>J. Amer. Statist. Assoc.</i> <b>47</b> 663–685.

Iacus, S. M., King, G. and Porro, G. (2009). CEM: Software for coarsened exact matching. <i>J. Statist. Software</i> <b>30</b> 9. Available at <a href="http://gking.harvard.edu/files/abs/cemR-abs.shtml">http://gking.harvard.edu/files/abs/cemR-abs.shtml</a>.

Imai, K., King, G. and Stuart, E. A. (2008). Misunderstandings among experimentalists and observationalists in causal inference. <i>J. Roy. Statist. Soc. Ser. A</i> <b>171</b> 481–502.

Joffe, M. M. and Rosenbaum, P. R. (1999). Propensity scores. <i>American Journal of Epidemiology</i> <b>150</b> 327–333.

Joffe, M. M., Ten Have, T. R., Feldman, H. I. and Kimmel, S. E. (2004). Model selection, confounder control, and marginal structural models. <i>Amer. Statist.</i> <b>58</b> 272–279.

Kang, J. D. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. <i>Statist. Sci.</i> <b>22</b> 523–539.

Keele, L. (2009). rbounds: An R package for sensitivity analysis with matched data. R package. Available at <a href="http://www.polisci.ohio-state.edu/faculty/lkeele/rbounds.html">http://www.polisci.ohio-state.edu/faculty/lkeele/rbounds.html</a>.

King, G. and Zeng, L. (2006). The dangers of extreme counterfactuals. <i>Political Analysis</i> <b>14</b> 131–159.

Kurth, T., Walker, A. M., Glynn, R. J., Chan, K. A., Gaziano, J. M., Berger, K. and Robins, J. M. (2006). Results of multivariable logistic regresion, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. <i>American Journal of Epidemiology</i> <b>163</b> 262–270.

Lechner, M. (2002). Some practical issues in the evaluation of heterogeneous labour market programmes by matching methods. <i>J. Roy. Statist. Soc. Ser. A</i> <b>165</b> 59–82.

Lee, B., Lessler, J. and Stuart, E. A. (2009). Improving propensity score weighting using machine learning. <i>Stat. Med.</i> <b>29</b> 337–346.

Li, Y. P., Propert, K. J. and Rosenbaum, P. R. (2001). Balanced risk set matching. <i>J. Amer. Statist. Assoc.</i> <b>96</b> 455, 870–882.

Lunt, M., Solomon, D., Rothman, K., Glynn, R., Hyrich, K., Symmons, D. P., Sturmer, T., the British Society for Rheumatology Biologics Register and the British Society for Rheumatology Biologics Register Contrl Centre Consortium (2009). Different methods of balancing covariates leading to different effect estimates in the presence of effect modification. <i>American Journal of Epidemiology</i> <b>169</b> 909–917.

McCaffrey, D. F., Ridgeway, G. and Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. <i>Psychological Methods</i> <b>9</b> 403–425.

Ming, K. and Rosenbaum, P. R. (2001). A note on optimal matching with variable controls using the assignment algorithm. <i>J. Comput. Graph. Statist.</i> <b>10</b> 455–463.

Morgan, S. L. and Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. <i>Sociological Methods &amp; Research</i> <b>35</b> 3–60.

Reinisch, J., Sanders, S., Mortensen, E. and Rubin, D. B. (1995). In utero exposure to phenobarbital and intelligence deficits in adult men. <i>Journal of the American Medical Association</i> <b>274</b> 1518–1525.

Ridgeway, G., McCaffrey, D. and Morral, A. (2006). twang: Toolkit for weighting and analysis of nonequivalent groups. Software for using matching methods in R. Available at <a href="http://cran.r-project.org/web/packages/twang/index.html">http://cran.r-project.org/web/packages/twang/index.html</a>.

Robins, J. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. <i>J. Amer. Statist. Assoc.</i> <b>90</b> 122–129.

Robins, J. M., Mark, S. and Newey, W. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. <i>Biometrics</i> <b>48</b> 479–495.

Rosenbaum, P. R. (1987a). Model-based direct adjustment. <i>J. Amer. Statist. Assoc.</i> <b>82</b> 387–394.

Rosenbaum, P. R. (1987b). The role of a second control group in an observational study (with discussion). <i>Statist. Sci.</i> <b>2</b> 292–316.

Rosenbaum, P. R. and Rubin, D. B. (1985a). The bias due to incomplete matching. <i>Biometrics</i> <b>41</b> 103–116.

Rosenbaum, P. R. and Rubin, D. B. (1985b). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. <i>Amer. Statist.</i> <b>39</b> 33–38.

Rubin, D. B. (1973a). Matching to remove bias in observational studies. <i>Biometrics</i> <b>29</b> 159–184.

Rubin, D. B. (1973b). The use of matched sampling and regression adjustment to remove bias in observational studies. <i>Biometrics</i> <b>29</b> 185–203.

Rubin, D. B. (1976a). Inference and missing data (with discussion). <i>Biometrika</i> <b>63</b> 581–592.

Rubin, D. B. (1976b). Multivariate matching methods that are equal percent bias reducing, I: Some examples. <i>Biometrics</i> <b>32</b> 109–120.

Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. <i>Health Services &amp; Outcomes Research Methodology</i> <b>2</b> 169–188.

Rubin, D. B. (2004). On principles for modeling propensity scores in medical research. <i>Pharmacoepidemiology and Drug Safety</i> <b>13</b> 855–857.

Rubin, D. B. and Stuart, E. A. (2006). Affinely invariant matching methods with discriminant mixtures of proportional ellipsoidally symmetric distributions. <i>Ann. Statist.</i> <b>34</b> 1814–1826.

Rubin, D. B. and Thomas, N. (1992a). Affinely invariant matching methods with ellipsoidal distributions. <i>Ann. Statist.</i> <b>20</b> 1079–1093.

Rubin, D. B. and Thomas, N. (1992b). Characterizing the effect of matching using linear propensity score methods with normal distributions. <i>Biometrika</i> <b>79</b> 797–809.

Rubin, D. B. and Thomas, N. (1996). Matching using estimated propensity scores, relating theory to practice. <i>Biometrics</i> <b>52</b> 249–264.

Schafer, J. L. and Kang, J. D. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated case study. <i>Psychological Methods</i> <b>13</b> 279–313.

Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for non-ignorable drop-out using semiparametric non-response models. <i>J. Amer. Statist. Assoc.</i> <b>94</b> 1096–1120.

Schneider, E. C., Zaslavsky, A. M. and Epstein, A. M. (2004). Use of high-cost operative procedures by Medicare beneficiaries enrolled in for-profit and not-for-profit health plans. <i>The New England Journal of Medicine</i> <b>350</b> 143–150.

Setoguchi, S., Schneeweiss, S., Brookhart, M. A., Glynn, R. J. and Cook, E. F. (2008). Evaluating uses of data mining techniques in propensity score estimation: A simulation study. <i>Pharmacoepidemiology and Drug Safety</i> <b>17</b> 546–555.

Shadish, W. R., Clark, M. and Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. <i>J. Amer. Statist. Assoc.</i> <b>103</b> 1334–1344.

Smith, H. (1997). Matching with multiple controls to estimate treatment effects in observational studies. <i>Sociological Methodology</i> <b>27</b> 325–353.

Song, J., Belin, T. R., Lee, M. B., Gao, X. and Rotheram-Borus, M. J. (2001). Handling baseline differences and missing items in a longitudinal study of HIV risk among runaway youths. <i>Health Services &amp; Outcomes Research Methodology</i> <b>2</b> 317–329.

Stuart, E. A. (2008). Developing practical recommendations for the use of propensity scores: Discussion of “A critical appraisal of propensity score matching in the medical literature between 1996 and 2003” by P. Austin. <i>Stat. Med.</i> <b>27</b> 2062–2065.

Stuart, E. A. and Green, K. M. (2008). Using full matching to estimate causal effects in non-experimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. <i>Developmental Psychology</i> <b>44</b> 395–406.

Wacholder, S. and Weinberg, C. R. (1982). Paired versus two-sample design for a clinical trial of treatments with dichotomous outcome: Power considerations. <i>Biometrics</i> <b>38</b> 801–812.

Weitzen, S., Lapane, K. L., Toledano, A. Y., Hume, A. L. and Mor, V. (2004). Principles for modeling propensity scores in medical research: A systematic literature review. <i>Pharmacoepidemiology and Drug Safety</i> <b>13</b> 841–853.

Zhao, Z. (2004). Using matching to estimate treatment effects: Data requirements, matching metrics, and Monte Carlo evidence. <i>Review of Economics and Statistics</i> <b>86</b> 91–107.