超高维数据的强大可变筛选程序

论文标题

超高维数据的强大可变筛选程序

A robust variable screening procedure for ultra-high dimensional data

论文作者

Ghosh, Abhik, Thoresen, Magne

论文摘要

超高维回归问题中的可变选择已成为一个重要问题。在这种情况下，受惩罚的回归模型可能面临计算问题，并且可能需要对变量进行一些预筛选。已经开发了许多此类预筛选程序；其中肯定的独立筛选（SIS）享有一些知名度。但是，SIS很容易受到数据中离群值的影响，尤其是在小样本中，这可能导致推断错误。在本文中，我们制定了一个新的强大筛选程序。我们建立在密度差异（DPD）估计方法的基础上，并引入DPD-SIS及其扩展迭代DPD-SIS。我们通过广泛的仿真研究来说明方法的行为，并表明它们在数据中有异常值时优于原始SIS和其他健壮方法。我们通过使用影响函数来证明声称的鲁棒性，并讨论了调整参数$α$的适当选择。最后，我们说明了它在调节脂质代谢的研究中的小数据集中的用途。

Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the sure independence screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. We demonstrate the claimed robustness through use of influence functions, and we discuss appropriate choice of the tuning parameter $α$. Finally, we illustrate its use on a small dataset from a study on regulation of lipid metabolism.

下载PDF全文

下载文献需遵守相关版权规定

论文标题