论文标题

贝叶斯Mi-lasso,用于在乘积数据上选择可变的选择

Bayesian MI-LASSO for Variable Selection on Multiply-Imputed Data

论文作者

Zou, Jungang, Wang, Sijian, Chen, Qixuan

论文摘要

多个插补被广泛用于处理现实世界中的丢失数据。但是,要在乘数输入数据集上的变量选择,但是,如果分别对每个估算数据集执行选择,则可能会导致跨数据集中的不同选择变量集。 Mi-lasso是解决此问题的最常用方法之一,将所有单独的估算数据集的相同变量视为组变量,并利用组套索以在所有多重输入的数据集中产生一致的变量选择。在本文中,我们将Mi-Lasso扩展到贝叶斯框架,并提出了四个贝叶斯Mi-Lasso模型,以在多输入数据上进行可变选择,包括三个基于基于先验的收缩先验和一个基于Spike-Slab的先验方法。为了进一步支持可靠的变量选择,我们开发了一个四步投影预测变量选择程序,该过程避免了临时阈值并促进有效的后选择推理。仿真研究表明,贝叶斯Mi-Lasso的表现优于Mi-Lasso和其他替代方法,在各种设置中达到了更高的特异性和较低的平方误差。我们通过案例研究进一步证明了这些方法,该案例研究使用密歇根大学二恶英暴露研究的多重信息数据集进行了研究。 R件Bmiselect可在Cran上找到。

Multiple imputation is widely used for handling missing data in real-world applications. For variable selection on multiply-imputed datasets, however, if selection is performed on each imputed dataset separately, it can result in different sets of selected variables across datasets. MI-LASSO, one of the most commonly used approaches to this problem, regards the same variable across all separate imputed datasets as a group variable and exploits the group LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend MI-LASSO to a Bayesian framework and propose four Bayesian MI-LASSO models for variable selection on multiply-imputed data, including three shrinkage prior-based and one Spike-Slab prior-based methods. To further support robust variable selection, we develop a four-step projection predictive variable selection procedure that avoids ad hoc thresholding and facilitates valid post-selection inference. Simulation studies showed that the Bayesian MI-LASSO outperformed MI-LASSO and other alternative approaches, achieving higher specificity and lower mean squared error across a range of settings. We further demonstrated these methods via a case study using a multiply-imputed dataset from the University of Michigan Dioxin Exposure Study. The R package BMIselect is available on CRAN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源