论文标题

Missdag:在有连续添加噪声模型的缺失数据的情况下,因果发现

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

论文作者

Gao, Erdun, Ng, Ignavier, Gong, Mingming, Shen, Li, Huang, Wei, Liu, Tongliang, Zhang, Kun, Bondell, Howard

论文摘要

最新的因果发现方法通常假定观察数据已经完成。但是,在许多实际情况(例如临床试验,经济学和生物学)中,缺失的数据问题无处不在。解决丢失数据问题的一种直接方法是首先使用现成的插补方法归档数据,然后应用现有的因果发现方法。但是,这样的两步方法可能会遭受次优的障碍,因为插入算法可能会引入对基础数据分布进行建模的偏见。在本文中,我们开发了一种称为Missdag的通用方法,以从具有不完整观察结果的数据中执行因果发现。主要集中在可忽视的遗失和可识别的加性噪声​​模型(ANM)的假设上,在期望最大化(EM)框架下,观测值可见部分的预期可能性最大化。在E-Step中,如果计算封闭形式中参数的后验分布是不可行的,则蒙特卡洛EM被利用以近似可能性。在M-STEP中,MissDag通过ANMS利用密度转换以更简单,更具体的公式对噪声分布进行建模,并使用基于可能的因果发现算法,并具有定向的无环图约束。我们通过广泛的模拟和真实的数据实验来证明Missdag的灵活性,用于结合各种因果发现算法及其功效。

State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源