论文标题
迈向时空数据的因果推断:哥伦比亚的冲突和森林丧失
Towards Causal Inference for Spatio-Temporal Data: Conflict and Forest Loss in Colombia
论文作者
论文摘要
在许多数据科学问题中,我们有兴趣在数据生成机制中推断因果关系。在这里,我们考虑以下现实世界中的问题:哥伦比亚冲突如何影响热带森林损失?有证据表明增强和减少影响。回答此类问题需要使用因果模型。在这项工作中,我们提出了一类用于时空随机过程的因果模型。它使我们能够正式定义和量化协变量向量$ x $在实数响应$ y $上的因果效应,即使因果背景知识不完整。我们介绍了一种估计因果效应的程序,以及针对这些效应的非参数假设检验为零。提出的方法不会做出强烈的分布假设,并允许任意许多潜在混杂因素,因为这些混杂因素不会随时间变化(或者,或者,它们在空间中不会变化)。在将我们的因果方法应用于冲突和森林损失的问题时,使用2000年至2018年的数据,我们发现冲突对森林损失的减少但微不足道的因果影响。在区域上,可以确定增强和还原效应。我们的理论发现由模拟支持,并且可以在线获得代码。
In many data scientific problems, we are interested in inferring causal relationships in the data generating mechanism. Here, we consider the following real-world question: how has the Colombian conflict influenced tropical forest loss? There is evidence for both enhancing and reducing impacts. Answering such questions requires the use of causal models. In this work, we propose a class of causal models for spatio-temporal stochastic processes. It allows us to formally define and quantify the causal effect of a vector of covariates $X$ on a real-valued response $Y$, even if the causal background knowledge is incomplete. We introduce a procedure for estimating causal effects, and a non-parametric hypothesis test for these effects being zero. The proposed methods do not make strong distributional assumptions, and allow for arbitrarily many latent confounders, given that these confounders do not vary across time (or, alternatively, they do not vary across space). When applying our causal methodology to the problem of conflict and forest loss, using data from 2000 to 2018, we find a reducing but insignificant causal effect of conflict on forest loss. Regionally, both enhancing and reducing effects can be identified. Our theoretical findings are supported by simulations, and code is available online.