论文标题

算是算还是不算?缺少治疗效果估计中的数据

To Impute or not to Impute? Missing Data in Treatment Effect Estimation

论文作者

Berrevoets, Jeroen, Imrie, Fergus, Kyono, Trent, Jordon, James, van der Schaar, Mihaela

论文摘要

在估计治疗效果时,丢失的数据是一个实用情况下的系统性问题,会导致噪声和偏见。这使得从数据中的数据估计中估算出了特别棘手的努力。这样做的一个关键原因是,由于存在额外的变量,处理(例如一个人)和标签(例如结果),因此对丢失的标准假设不足。该处理变量引入了有关为什么缺少某些变量的额外复杂性,而以前的工作未完全探索。在我们的工作中,我们引入了混杂的混杂性(MCM),一种新的缺失机制,其中一些缺失决定治疗选择和其他缺失是由治疗选择决定的。鉴于MCM,我们表明,天真地归纳所有数据导致执行治疗效果模型不佳,因为插补的行为有效地消除了提供无偏估计的必要信息。然而,由于治疗确定的缺失引起了协变量的偏见,因此根本没有任何插补也会带来偏见的估计。我们的解决方案是选择性插补,我们使用MCM的见解准确地告知应该估算哪些变量,哪些变量不应该。我们从经验上证明,与缺少数据的其他解决方案相比,各种学习者如何从选择性归因中受益。我们强调,我们的实验包括平均治疗效果和有条件的平均治疗效果。

Missing data is a systemic problem in practical scenarios that causes noise and bias when estimating treatment effects. This makes treatment effect estimation from data with missingness a particularly tricky endeavour. A key reason for this is that standard assumptions on missingness are rendered insufficient due to the presence of an additional variable, treatment, besides the input (e.g. an individual) and the label (e.g. an outcome). The treatment variable introduces additional complexity with respect to why some variables are missing that is not fully explored by previous work. In our work we introduce mixed confounded missingness (MCM), a new missingness mechanism where some missingness determines treatment selection and other missingness is determined by treatment selection. Given MCM, we show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates. However, no imputation at all also leads to biased estimates, as missingness determined by treatment introduces bias in covariates. Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not. We empirically demonstrate how various learners benefit from selective imputation compared to other solutions for missing data. We highlight that our experiments encompass both average treatment effects and conditional average treatment effects.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源