论文标题
$ \ ell_ {p} $ - 正规统计信息的模型-X仿制的功率分析
A Power Analysis for Model-X Knockoffs with $\ell_{p}$-Regularized Statistics
论文作者
论文摘要
利用惩罚样式估计的程序的可变选择属性是研究高维线性回归问题的核心主题。现有文献强调了通过在接收器操作特征曲线或预测性能中反映的过程的变量排名质量。具体而言,最近的作品利用了现代的近似消息通话理论(AMP),以在特定的环境中获得I型I型II型II型错误预测的确切渐近预测,用于依靠$ \ ell_ {p} $正则化估算器的选择程序。实际上,有效排名通常是不够的,因为需要进行I型错误的某些校准。在这项工作中,我们从理论上研究了选择程序的功能,这些功能类似地按$ \ ell_ {p} $的大小对特征进行排名,但进一步使用Model-X仿基来控制现实情况下的虚假发现率,在现实情况下,没有有关信号可用的先前信息。在分析所得过程的功能时,我们扩展了AMP理论中的现有结果,以处理原始变量及其仿冒品之间的配对。这用于得出功率的精确渐近预测。我们应用一般的结果比较了lasso和阈值 - 拉索选择的仿冒版本的功能,并在I.I.D中证明了这一点。正在考虑的协变量设置,增强设计矩阵上的交叉验证调整几乎是最佳的。我们进一步证明了技术如何还可以分析S型误差,以及在选择系数符号的决定中补充选择时的相应功率概念。
Variable selection properties of procedures utilizing penalized-likelihood estimates is a central topic in the study of high dimensional linear regression problems. Existing literature emphasizes the quality of ranking of the variables by such procedures as reflected in the receiver operating characteristic curve or in prediction performance. Specifically, recent works have harnessed modern theory of approximate message-passing (AMP) to obtain, in a particular setting, exact asymptotic predictions of the type I-type II error tradeoff for selection procedures that rely on $\ell_{p}$-regularized estimators. In practice, effective ranking by itself is often not sufficient because some calibration for Type I error is required. In this work we study theoretically the power of selection procedures that similarly rank the features by the size of an $\ell_{p}$-regularized estimator, but further use Model-X knockoffs to control the false discovery rate in the realistic situation where no prior information about the signal is available. In analyzing the power of the resulting procedure, we extend existing results in AMP theory to handle the pairing between original variables and their knockoffs. This is used to derive exact asymptotic predictions for power. We apply the general results to compare the power of the knockoffs versions of Lasso and thresholded-Lasso selection, and demonstrate that in the i.i.d. covariate setting under consideration, tuning by cross-validation on the augmented design matrix is nearly optimal. We further demonstrate how the techniques allow to analyze also the Type S error, and a corresponding notion of power, when selections are supplemented with a decision on the sign of the coefficient.