论文标题
结合实验和观察数据以识别和估计长期因果效应
Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects
论文作者
论文摘要
我们考虑使用来自观察域和实验域的数据来识别和估计治疗变量对长期结果变量的因果效应的任务。观察域受到未观察到的混杂状态。此外,实验中的受试者仅在短时间内进行。因此,治疗的长期影响未观察到,但将观察到短期影响。因此,仅来自领域的数据就足以使治疗对长期结果的影响的因果推断,而必须以原则性的方式进行汇总。 Athey等。 (2020)提出了一种系统地组合此类数据以识别下游因果关系效应的方法。他们的方法基于实验数据的内部和外部有效性的假设,以及一个额外的新颖假设,称为潜在的无共同性。在本文中,我们首先回顾了他们提出的方法,然后我们提出了三种替代方法来进行数据融合,以识别和估计平均治疗效果以及治疗对治疗的影响。我们的第一种方法是基于假设对短期和长期结局的等值偏见。我们的第二种方法是基于宽松的等值偏差假设的放松版本,在该假设中,我们假设存在观察到的混杂因素,使得短期和长期潜在的潜在结果变量与该混杂因子具有相同的部分添加剂。我们的第三种方法是基于近端因果推理框架,在该框架中,我们假设系统中存在一个额外的变量,该变量是治疗结果关系的潜在混杂因素的代理。我们提出了每个数据融合框架的基于影响功能的估计策略,并研究提出的估计器的鲁棒性能。
We consider the task of identifying and estimating the causal effect of a treatment variable on a long-term outcome variable using data from an observational domain and an experimental domain. The observational domain is subject to unobserved confounding. Furthermore, subjects in the experiment are only followed for a short period of time; hence, long-term effects of treatment are unobserved but short-term effects will be observed. Therefore, data from neither domain alone suffices for causal inference about the effect of the treatment on the long-term outcome, and must be pooled in a principled way, instead. Athey et al. (2020) proposed a method for systematically combining such data for identifying the downstream causal effect in view. Their approach is based on the assumptions of internal and external validity of the experimental data, and an extra novel assumption called latent unconfoundedness. In this paper, we first review their proposed approach, and then we propose three alternative approaches for data fusion for the purpose of identifying and estimating average treatment effect as well as the effect of treatment on the treated. Our first approach is based on assuming equi-confounding bias for the short-term and long-term outcomes. Our second approach is based on a relaxed version of the equi-confounding bias assumption, where we assume the existence of an observed confounder such that the short-term and long-term potential outcome variables have the same partial additive association with that confounder. Our third approach is based on the proximal causal inference framework, in which we assume the existence of an extra variable in the system which is a proxy of the latent confounder of the treatment-outcome relation. We propose influence function-based estimation strategies for each of our data fusion frameworks and study the robustness properties of the proposed estimators.