论文标题
用于样品有效治疗效果估计的高维特征选择
High-Dimensional Feature Selection for Sample Efficient Treatment Effect Estimation
论文作者
论文摘要
观察数据中因果治疗效应的估计是因果推断的基本问题。为了避免偏见,估计器必须控制所有混杂因素。因此,从业者通常会收集尽可能多的协变量的数据,以增加包括相关混杂因素的机会。尽管这解决了偏见,但这具有显着增加由于维度增加而准确估计效果所需的数据样本数量的副作用。在这项工作中,我们考虑了满足强大无知性的大量协变量$ x $中的设置,一个未知的稀疏子集$ s $足以包括实现零偏见,即$ c $ - 等于$ x $。我们提出了一个共同的目标函数,涉及与非凸的关节稀疏正规化的治疗群体的结果,该结果可以保证在$ y $ y $ y $和Subgaussian协变量的线性结果模型下恢复了$ S $,每种治疗队列的subgaussian协变量。这提高了效果估计样本的复杂性,因此与稀疏子集$ s $和$ \ log | x | $的基数相比,它与完整集合$ x $的基数相反。我们通过实验治疗效果估计来验证我们的方法。
The estimation of causal treatment effects from observational data is a fundamental problem in causal inference. To avoid bias, the effect estimator must control for all confounders. Hence practitioners often collect data for as many covariates as possible to raise the chances of including the relevant confounders. While this addresses the bias, this has the side effect of significantly increasing the number of data samples required to accurately estimate the effect due to the increased dimensionality. In this work, we consider the setting where out of a large number of covariates $X$ that satisfy strong ignorability, an unknown sparse subset $S$ is sufficient to include to achieve zero bias, i.e. $c$-equivalent to $X$. We propose a common objective function involving outcomes across treatment cohorts with nonconvex joint sparsity regularization that is guaranteed to recover $S$ with high probability under a linear outcome model for $Y$ and subgaussian covariates for each of the treatment cohort. This improves the effect estimation sample complexity so that it scales with the cardinality of the sparse subset $S$ and $\log |X|$, as opposed to the cardinality of the full set $X$. We validate our approach with experiments on treatment effect estimation.