Causal Inference by using Invariant Prediction: Identification and Confidence Intervals

Jonas Peters1,2, Peter Bühlmann3, Nicolai Meinshausen3
1Eidgen¨ossiche Technische Hochschule Z¨ urich , Switzerland
2Max Planck Institute for Intelligent Systems , T¨ubingen , Germany
3Eidgenössiche Technische Hochschule Zürich   Switzerland

Tóm tắt

SummaryWhat is the difference between a prediction that is made with a causal model and that with a non-causal model? Suppose that we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (e.g. various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments.

Từ khóa


Tài liệu tham khảo

Aldrich, 1989, Autonomy, Oxf. Econ. Pap., 41, 15, 10.1093/oxfordjournals.oep.a041889

Andersson, 1997, A characterization of Markov equivalence classes for acyclic digraphs, Ann. Statist., 25, 505, 10.1214/aos/1031833662

Angrist, 1996, Identification of causal effects using instrumental variables, J. Am. Statist. Ass., 91, 444, 10.1080/01621459.1996.10476902

Belloni, 2011, Square-root lasso: pivotal recovery of sparse signals via conic programming, Biometrika, 98, 791, 10.1093/biomet/asr043

Bollen, 1989, Structural Equations with Latent Variables, 10.1002/9781118619179

Bowden, 1990, Instrumental Variables

Bühlmann, 2011, Statistics for High-dimensional Data: Methods, Theory and Applications, 10.1007/978-3-642-20192-9

Bühlmann, 2014, CAM: causal additive models, high-dimensional order search and penalized regression, Ann. Statist., 42, 2526, 10.1214/14-AOS1260

Bühlmann, 2013, Controlling false positive selections in high-dimensional regression and causal inference, Statist. Meth. Med. Res., 22, 466, 10.1177/0962280211428371

Bühlmann, 2003, Boosting with the L2-loss: regression and classification, J. Am. Statist. Ass., 98, 324, 10.1198/016214503000125

Castelo, 2003, On inclusion-driven learning of Bayesian networks, J. Mach. Learn. Res., 4, 527

Chickering, 2002, Optimal structure identification with greedy search, J. Mach. Learn. Res., 3, 507

Chow, 1960, Tests of equality between sets of coefficients in two linear regressions, Econometrica, 28, 591, 10.2307/1910133

Cooper, 1999, Proc. 15th A. Conf. Uncertainty in Artificial Intelligence, 116

Cramér, 1936, Über eine Eigenschaft der normalen Verteilungsfunktion, Math. Zeits., 41, 405, 10.1007/BF01180430

Dawid, 2000, Causal inference without counterfactuals, J. Am. Statist. Ass., 95, 407, 10.1080/01621459.2000.10474210

Dawid, 2007, Counterfactuals, hypotheticals and potential responses: a philosophical examination of statistical causality, 505

Dawid, 2012, Causality: Statistical Perspectives and Applications, 25, 10.1002/9781119945710.ch4

Dawid, 2015, Statistical causality from a decision-theoretic perspective, A. Rev. Statist. Appl., 2, 273, 10.1146/annurev-statistics-010814-020105

Dawid, 2010, Identifying the consequences of dynamic treatment strategies: a decision-theoretic overview, Statist. Surv., 4, 184, 10.1214/10-SS081

Didelez, 2006, Proc. 22nd A. Conf. Uncertainty in Artifical Intelligence, 138

Didelez, 2010, Assumptions of IV methods for observational epidemiology, Statist. Sci., 25, 22, 10.1214/09-STS316

Duncan, 1975, Introduction to Structural Equation Models

Durot, 2013, Testing equality of functions under monotonicity constraints, J. Nonparam. Statist., 25, 939, 10.1080/10485252.2013.826356

Eaton, 2007, Proc. 11th Int. Conf. Artificial Intelligence and Statistics, 107

Eberhardt, 2007, Interventions and causal inference, Philos. Sci., 74, 981, 10.1086/525638

Friedman, 2001, Greedy function approximation: a gradient boosting machine, Ann. Statist., 29, 1189, 10.1214/aos/1013203451

Greenland, 1999, Causal diagrams for epidemiologic research, Epidemiology, 10, 37, 10.1097/00001648-199901000-00008

Haavelmo, 1944, The probability approach in econometrics, Econometrica, 12, S1, 10.2307/1906935

Hauser, 2012, Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs, J. Mach. Learn. Res., 13, 2409

Hauser, 2015, Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs, J. R. Statist. Soc., 77, 291, 10.1111/rssb.12071

He, 2008, Active learning of causal networks with intervention experiments and optimal designs, J. Mach. Learn. Res., 9, 2523

Hernán, 2006, Instruments for causal inference: an epidemiologist's dream, Epidemiology, 17, 360, 10.1097/01.ede.0000222409.00878.37

Hoover, 1990, The logic of causal inference, Econ. Philos., 6, 207, 10.1017/S026626710000122X

Hothorn, 2010, Model-based boosting 2.0, 11, 2109

Hoyer, 2009, Advances in Neural Information Processing Systems, 689

Hyttinen, 2012, Learning linear cyclic causal models with latent variables, J. Mach. Learn. Res., 13, 3387

Jackson, 2003, Expression profiling reveals off-target gene regulation by RNAi, Nat. Biotechnol., 21, 635, 10.1038/nbt831

Janzing, 2012, Information-geometric approach to inferring causal directions, Artif. Intell., 182–183, 1, 10.1016/j.artint.2012.01.002

Kalisch, 2007, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, J. Mach. Learn. Res., 8, 613

Kang, 2015, Instrumental variables estimation with some invalid instruments and its application to mendelian randomization, J. Am. Statist. Ass.

Kemmeren, 2014, Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors, Cell, 157, 740, 10.1016/j.cell.2014.02.054

Kulkarni, 2006, Evidence of off-target effects associated with long dsrnas in drosophila melanogaster cell-based assays, Nat. Meth., 3, 833, 10.1038/nmeth935

Lauritzen, 1996, Graphical Models, 10.1093/oso/9780198522195.001.0001

Lauritzen, 2002, Chain graph models and their causal interpretations, J. R. Statist. Soc., 64, 321, 10.1111/1467-9868.00340

Lauritzen, 1988, Local computations with probabilities on graphical structures and their application to expert systems (with discussion), J. R. Statist. Soc., 50, 157, 10.1111/j.2517-6161.1988.tb01721.x

Maathuis, 2009, Estimating high-dimensional intervention effects from observational data, Ann. Statist., 37, 3133, 10.1214/09-AOS685

Mooij, 2011, Advances in Neural Information Processing Systems, 639

Pearl, 2009, Causality: Models, Reasoning, and Inference, 10.1017/CBO9780511803161

Peters, 2014, Identifiability of Gaussian structural equation models with equal error variances, Biometrika, 101, 219, 10.1093/biomet/ast043

Peters, 2014, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15, 2009

R Core Team, 2014, R: a Language and Environment for Statistical Computing

Richardson, 2013, Single world intervention graphs (SWIGs): a unification of the counterfactual and graphical approaches to causality

Richardson, 2002, Ancestral graph markov models, Ann. Statist., 30, 962, 10.1214/aos/1031689015

Robins, 1986, A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect, Math. Modllng, 7, 1393, 10.1016/0270-0255(86)90088-6

Robins, 2000, Marginal structural models and causal inference in epidemiology, Epidemiology, 11, 550, 10.1097/00001648-200009000-00011

Rothenhäusler, 2015, Advances in Neural Information Processing Systems

Rouse, 1995, Democratization or diversion?: The effect of community colleges on educational attainment, J. Bus. Econ. Statist., 13, 217, 10.1080/07350015.1995.10524596

Rubin, 2005, Causal inference using potential outcomes, J. Am. Statist. Ass., 100, 322, 10.1198/016214504000001880

Schapire, 1998, Boosting the margin: a new explanation for the effectiveness of voting methods, Ann. Statist., 26, 1651

Schölkopf, 2012, Proc. 29th Int. Conf. Machine Learning, 1255

Shimizu, 2006, A linear non-Gaussian acyclic model for causal discovery, J. Mach. Learn. Res., 7, 2003

Shimizu, 2011, DirectLiNGAM: a direct method for learning a linear non-Gaussian structural equation model, J. Mach. Learn. Res., 12, 1225

Spirtes, 2000, Causation, Prediction, and Search

Stock, 2003, Introduction to Econometrics

Terza, 2008, Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling, J. Hlth Econ., 27, 531, 10.1016/j.jhealeco.2007.09.009

Tian, 2001, Proc. 17th A. Conf. Uncertainty in Artificial Intelligence, 512

Tibshirani, 1996, Regression shrinkage and selection via the lasso, J. R. Statist. Soc. B, 58, 267, 10.1111/j.2517-6161.1996.tb02080.x

VanderWeele, 2010, Signed directed acyclic graphs for causal inference, J. R. Statist. Soc., 72, 111, 10.1111/j.1467-9868.2009.00728.x

Verma, 1991, Proc. 6th A. Conf. Uncertainty in Artificial Intelligence, 255

Wright, 1928, The Tariff on Animal and Vegetable Oils

Wright, 1921, Correlation and causation, J. Agric. Res., 20, 557

Aalen, 2014, Can we believe the DAGs?: a comment on the relationship between causal DAGs and mechanisms, Statist. Meth. Med. Res.

Aalen, 2012, Causality, mediation and time: a dynamic viewpoint, J. R. Statist. Soc. A, 175, 831, 10.1111/j.1467-985X.2011.01030.x

Acid, 1996, Proc. 12th A. Conf. Uncertainty in Artificial Intelligence, 3

Aldrich, 1989, Autonomy, Oxf. Econ. Pap., 41, 15, 10.1093/oxfordjournals.oep.a041889

Allman, 2009, Identifiability of parameters in latent structure models with many observed variables, Ann. Statist., 6, 3009

Babtie, 2014, Topological sensitivity analysis for systems biology, Proc. Natn. Acad. Sci. USA, 111, 18507, 10.1073/pnas.1414026112

Bareinboim, 2012, Local characterizations of causal Bayesian networks, 1

Bareinboim, 2016, Causal inference and the data-fusion problem, Proc. Natn. Acad. Sci. USA, 113, 7345, 10.1073/pnas.1510507113

Bollen, 1989, Structural Equations with Latent Variables, 10.1002/9781118619179

Breiman, 2001, Statistical modeling: the two cultures (with comments), Statist. Sci., 16, 199, 10.1214/ss/1009213726

Carroll, 2006, Measurement Error in Nonlinear Models: a Modern Perspective, 10.1201/9781420010138

Colombo, 2012, Learning high-dimensional directed acyclic graphs with latent and selection variables, Ann. Statist., 40, 294, 10.1214/11-AOS940

Constantinou, 2016, Extended conditional independence and applications in causal inference

Cooper, 1997, A simple constraint-based algorithm for efficiently mining observational databases for causal relationships, Data Minng Knowl. Discov., 1, 203, 10.1023/A:1009787925236

Davidson, 1993, Estimation and Inference in Econometrics

Dawid, 2000, Causal inference without counterfactuals (with discussion), J. Am. Statist. Ass., 95, 407, 10.1080/01621459.2000.10474210

Dawid, 2002, Influence diagrams for causal modelling and inference, Int. Statist. Rev., 70, 161, 10.1111/j.1751-5823.2002.tb00354.x

Dawid, 2015, Statistical causality from a decision-theoretic perspective, A. Rev. Statist. Appl., 2, 273, 10.1146/annurev-statistics-010814-020105

Dawid, 2010, Identifying the consequences of dynamic treatment strategies: a decision-theoretic overview, Statist. Surv., 4, 184, 10.1214/10-SS081

Diebold, 2001, Elements of Forecasting, 254

Ding, 2011, Identifiability and estimation of causal effects by principal stratification with outcomes truncated by death, J. Am. Statist. Ass., 106, 1578, 10.1198/jasa.2011.tm10265

Eckardt, 2016, Point patterns occurring on complex structures in space and space-time: an alternative network approach

Ellis, 2008, Learning causal Bayesian network structures from experimental data, J. Am. Statist. Ass., 103, 778, 10.1198/016214508000000193

Encyclopedia Britannica, 2014, Encyclopedia Britannica

Fan, 2001, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Statist. Ass., 96, 1348, 10.1198/016214501753382273

Fan, 2008, Sure independence screening for ultrahigh dimensional feature space (with discussion), J. R. Statist. Soc., 70, 849, 10.1111/j.1467-9868.2008.00674.x

Finkenstädt, 2013, Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: an application to single cell data, Ann. Appl. Statist., 7, 1960, 10.1214/13-AOAS669

Francis, 2016, “Building” exact confidence nets, Bernoulli

Freedman, 1999, Are there algorithms that discover causal structure, Synthese, 121, 29, 10.1023/A:1005277613752

van de Geer, 2014, On asymptotically optimal confidence regions and tests for high-dimensional models, Ann. Statist., 42, 1166, 10.1214/14-AOS1221

Granger, 1969, Investigating causal relations by econometric models and cross-spectral methods, Econometrica, 137, 424, 10.2307/1912791

Haavelmo, 1995, The Foundations of Econometric Analysis, 440, 10.1017/CBO9781139170116.042

Hernán, 2016, Causal Inference

Hill, 2016, Inferring causal molecular networks: empirical assessment through a community-based effort, Nat. Meth, 13, 310, 10.1038/nmeth.3773

Hoefer, 2016, The Stanford Encyclopedia of Philosophy

Hora, 1967, Fiducial theory and invariant prediction, Ann. Math. Statist., 38, 795, 10.1214/aoms/1177698873

Hoyer, 2009, Advances in Neural information Processing Systems, 689

Hu, 2016, Analysis of air quality time series of Hong Kong with graphical modeling, Environmetrics, 27, 169, 10.1002/env.2386

Imbens, 2015, Causal Inference for Statistics, Social and Biomedical Sciences, 10.1017/CBO9781139025751

James, 1954, Normal multivariate analysis and the orthogonal group, Ann. Math. Statist., 25, 40, 10.1214/aoms/1177728846

Jiang, 2016, Principal causal effect identification and principal surrogate end point evaluation by multiple trials, J. R. Statist. Soc., 79, 829, 10.1111/rssb.12135

Jo, 2002, Estimation of intervention effects with noncompliance: alternative model specifications, J. Educ. Behav. Statist., 27, 385, 10.3102/10769986027004385

Jørgensen, 1987, Exponential dispersion models (with discussion), J. R. Statist. Soc., 49, 127, 10.1111/j.2517-6161.1987.tb01685.x

Kalisch, 2007, Estimating high-dimensional directed acyclic graphs with the PC-algorithm, J. Mach. Learn. Res., 8, 613

Kemmeren, 2014, Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors, Cell, 157, 740, 10.1016/j.cell.2014.02.054

Kling, 2007, Experimental analysis of neighborhood effects, Econometrica, 75, 83, 10.1111/j.1468-0262.2007.00733.x

Lauritzen, 2001, Complex Stochastic Systems

Lauritzen, 2001, Representing and solving decision problems with limited information, Mangmnt Sci., 47, 1235, 10.1287/mnsc.47.9.1235.9779

Lindquist, 2012, Functional causal mediation analysis with an application to brain connectivity, J. Am. Statist. Ass., 107, 1297, 10.1080/01621459.2012.695640

Luo, 2011, Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data, Ann. Appl. Statist., 5, 725, 10.1214/10-AOAS425

Meinshausen, 2006, High-dimensional graphs and variable selection with the lasso, Ann. Statist., 34, 1436, 10.1214/009053606000000281

Meinshausen, 2016, Methods for causal inference from gene perturbation experiments and validation, Proc. Natn. Acad. Sci. USA, 10.1073/pnas.1510493113

Morgan, 2014, Counterfactuals and Causal Inference: Methods and Principles for Social Research, 10.1017/CBO9781107587991

Newey, 1990, Semiparametric efficiency bounds, J. Appl. Econmetr., 5, 99, 10.1002/jae.3950050202

Oates, 2012, Network inference using steady state data and Goldbeter–Koshland kinetics, Bioinformatics, 28, 2342, 10.1093/bioinformatics/bts459

Oates, 2016, A pre-processing approach to repair of misspecified causal diagrams

Oates, 2014, Joint estimation of multiple related biological networks, Ann. Appl. Statist., 8, 1892, 10.1214/14-AOAS761

Oates, 2012, Network inference and biological dynamics, Ann. Appl. Statist., 6, 1209, 10.1214/11-AOAS532

Obenchein, 1971, Multivariate procedures invariant under linear transformations, Ann. Math. Statist., 42, 1569, 10.1214/aoms/1177693155

Pearl, 2000, Causality: Models, Reasoning and Inference

Pearl, 2009, Causal inference in statistics: an overview, Statist. Surv., 3, 96, 10.1214/09-SS057

Pearl, 2014, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Pearl, 2014, External validity: from do-calculus to transportability across populations, Statist. Sci., 29, 579, 10.1214/14-STS486

Peters, 2015, Causal inference using invariant prediction: identification and confidence intervals

Pomann, 2016, A two-sample distribution-free test for functional data with application to a diffusion tensor imaging study of multiple sclerosis, Appl. Statist., 65, 395

Reardon, 2013, Under what assumptions do site-by-treatment instruments identify average causal effects?, Sociol. Meth. Res., 42, 143, 10.1177/0049124113494575

Richardson, 1996, Proc. 12th A. Conf. Uncertainty in Artificial Intelligence, 454

Richardson, 2002, Ancestral graph Markov models, Ann. Statist., 30, 962, 10.1214/aos/1031689015

Robins, 1991, Correcting for non-compliance in randomized trials using rank preserving structural failure time models, Communs Statist. Theor. Meth., 20, 2609, 10.1080/03610929108830654

Rothenhäusler, 2015, Advances in Neural Information Processing Systems, 1513

Røysland, 2012, Counterfactual, analyses with graphical models based on local independence, Ann. Statist., 40, 2162, 10.1214/12-AOS1031

Rubin, 1978, Bayesian inference for causal effects: the role of randomization, Ann. Statist., 6, 34, 10.1214/aos/1176344064

Sachs, 2005, Causal protein-signaling networks derived from multiparameter single-cell data, Science, 308, 523, 10.1126/science.1105809

Shafer, 1996, The Art of Causal Conjecture, 10.7551/mitpress/1403.001.0001

Shaughnessy, 2012, Research Methods in Psychology, 447

Shimizu, 2006, A linear non-Gaussian acyclic model for causal discovery, J. Mach. Learn. Res., 7, 2003

Shpitser, 2012, Parameter and structure learning in nested Markov models

Silva, 2006, Learning the structure of linear latent variable models, J. Mach. Learn. Res., 7, 191

Sokol, 2011, Causal interpretation of stochastic differential equations, Electron. J. Probab., 19, 1

Spearman, 1904, “General intelligence,” objectively determined and measured, Am. J. Psychol., 15, 210, 10.2307/1412107

Spirtes, 2000, Causation, Prediction and Search

Stehlík, 2014, On robust testing for normality in chemometrics, Chemometr. Intell. Lab. Syst., 130, 98, 10.1016/j.chemolab.2013.10.010

Thwaites, 2013, Causal identifiability via chain event graphs, Artif. Intell., 195, 291, 10.1016/j.artint.2012.09.003

Thwaites, 2010, Causal analysis with chain event graphs, Artif. Intell., 174, 889, 10.1016/j.artint.2010.05.004

VanderWeele, 2015, Explanation in Causal Inference: Methods for Mediation and Interaction

VanderWeele, 2013, Causal inference under multiple versions of treatment, J. Causl Inf., 1, 1, 10.1515/jci-2012-0002

Wikipedia, 2016, Wikipedia

Zhu, 2004, Causal linkages among Shanghai, Shenzhen, and Hong Kong stock markets, Int. J. Theoret. Appl. Finan., 7, 135, 10.1142/S0219024904002414