论文标题

缺少价值仿冒品

Missing Value Knockoffs

论文作者

Koyuncu, Deniz, Yener, Bülent

论文摘要

最统计/机器学习的可变选择方法的一个局限性是它们无法控制错误的选择。最近引入的框架型X仿基提供了广泛的模型,但缺乏对具有缺失值的数据集的支持。在这项工作中,我们讨论了在丢失的数据设置中保留Model-X框架的理论保证的方法。首先,我们证明,后取样的插定可以在存在缺失值的情况下重复使用现有的仿真采样器。其次,我们表明,仅针对观察到的变量和应用单变量插补的采样仿冒品也保留了错误的选择保证。第三,对于潜在变量模型的特殊情况,我们证明了共同归纳和采样仿冒品如何降低计算复杂性。我们已经用两个不同的探索性变量分布验证了理论发现,并研究了丢失的数据模式,相关量,观察次数和缺失值如何影响统计能力。

One limitation of the most statistical/machine learning-based variable selection approaches is their inability to control the false selections. A recently introduced framework, model-x knockoffs, provides that to a wide range of models but lacks support for datasets with missing values. In this work, we discuss ways of preserving the theoretical guarantees of the model-x framework in the missing data setting. First, we prove that posterior sampled imputation allows reusing existing knockoff samplers in the presence of missing values. Second, we show that sampling knockoffs only for the observed variables and applying univariate imputation also preserves the false selection guarantees. Third, for the special case of latent variable models, we demonstrate how jointly imputing and sampling knockoffs can reduce the computational complexity. We have verified the theoretical findings with two different exploratory variable distributions and investigated how the missing data pattern, amount of correlation, the number of observations, and missing values affected the statistical power.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源