论文标题
在无法衡量的混杂下的可区分因果发现
Differentiable Causal Discovery Under Unmeasured Confounding
论文作者
论文摘要
由于存在未测量的变量,从生物,经济和社会系统中得出的数据通常会混淆。因果发现中的先前工作集中于选择无环的有向混合图(ADMG),特别是祖先ADMG的离散搜索程序,该过程编码了系统观察到的变量之间的普通条件独立约束。但是,混杂的系统还表现出更一般的平等限制,这些限制无法通过这些图来表示,从而限制了可以使用祖先ADMG学习的结构的种类。在这项工作中,我们得出了完全表征祖先ADMG的空间以及更一般的ADMG,ARDGS和无弓形ADMG的可区分代数约束,这些限制捕获了观察到的变量上的所有平等限制。我们使用这些约束来将因果发现作为一个连续的优化问题,并设计可区分的过程,以找到数据最佳拟合ADMG时,当数据来自具有相关错误的混杂线性方程式。我们通过模拟和应用于蛋白质表达数据集的方法证明了我们的方法的功效。实施我们方法的代码是开源的,可以在https://gitlab.com/rbhatta8/dcd上公开获得,并将合并到Ananke软件包中。
The data drawn from biological, economic, and social systems are often confounded due to the presence of unmeasured variables. Prior work in causal discovery has focused on discrete search procedures for selecting acyclic directed mixed graphs (ADMGs), specifically ancestral ADMGs, that encode ordinary conditional independence constraints among the observed variables of the system. However, confounded systems also exhibit more general equality restrictions that cannot be represented via these graphs, placing a limit on the kinds of structures that can be learned using ancestral ADMGs. In this work, we derive differentiable algebraic constraints that fully characterize the space of ancestral ADMGs, as well as more general classes of ADMGs, arid ADMGs and bow-free ADMGs, that capture all equality restrictions on the observed variables. We use these constraints to cast causal discovery as a continuous optimization problem and design differentiable procedures to find the best fitting ADMG when the data comes from a confounded linear system of equations with correlated errors. We demonstrate the efficacy of our method through simulations and application to a protein expression dataset. Code implementing our methods is open-source and publicly available at https://gitlab.com/rbhatta8/dcd and will be incorporated into the Ananke package.