论文标题

使用机器学习和影响功能估算结构目标功能

Estimating Structural Target Functions using Machine Learning and Influence Functions

论文作者

Curth, Alicia, Alaa, Ahmed M., van der Schaar, Mihaela

论文摘要

我们旨在构建一类学习算法,这些算法对于生物统计学,流行病学和计量经济学等领域的应用研究人员具有实际价值,在这些领域中,从未完全观察到的信息中需要学习的需求无处不在。我们提出了一个新的框架,用于将目标功能的统计机器学习作为统计模型的可识别函数产生,我们称其为“如果学习”,这是由于其依赖影响功能(IFS)。该框架是问题和模型不可或缺的,可用于估计应用统计中感兴趣的广泛目标参数:我们可以考虑以分析形式存在的任何目标功能。在整个过程中,我们特别关注所谓的粗化,以部分未观察到的信息随机/双重强大的问题。这包括诸如缺失结果数据的治疗效果估计和推断之类的问题。 Within this framework, we propose two general learning algorithms that build on the idea of​​ nonparametric plug-in bias removal via IFs: the 'IF-learner' which uses pseudo-outcomes motivated by uncentered IFs for regression in large samples and outputs entire target functions without confidence bands, and the 'Group-IF-learner', which outputs only approximations to a function but can give confidence estimates if sufficient information on coarsening mechanisms可用。我们将两者都应用于推断治疗效果的仿真研究中。

We aim to construct a class of learning algorithms that are of practical value to applied researchers in fields such as biostatistics, epidemiology and econometrics, where the need to learn from incompletely observed information is ubiquitous. We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models, which we call `IF-learning' due to its reliance on influence functions (IFs). This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics: we can consider any target function for which an IF of a population-averaged version exists in analytic form. Throughout, we put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information. This includes problems such as treatment effect estimation and inference in the presence of missing outcome data. Within this framework, we propose two general learning algorithms that build on the idea of nonparametric plug-in bias removal via IFs: the 'IF-learner' which uses pseudo-outcomes motivated by uncentered IFs for regression in large samples and outputs entire target functions without confidence bands, and the 'Group-IF-learner', which outputs only approximations to a function but can give confidence estimates if sufficient information on coarsening mechanisms is available. We apply both in a simulation study on inferring treatment effects.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源