第二代p值选择可变选择

论文标题

第二代p值选择可变选择

Variable Selection with Second-Generation P-Values

论文作者

Zuo, Yi, Stewart, Thomas G., Blume, Jeffrey D.

论文摘要

在过去一个世纪，已经提出了许多用于可变选择的统计方法，但是很少有平衡推断和预测任务。在这里，我们报告了一种新型的变量选择方法，称为第二代P值（ProSGPV）的惩罚回归。它以当前标准实现的最佳速率捕获真实的模型，在实践中易于实现，并且通常会产生最小的参数估计误差。这个想法是使用第二代P值（SGPV）而不是传统的L0惩罚方案来确定模型中的哪些变量保留。该方法可以平衡支持恢复，参数估计和预测任务的切实优势。即使特征之间或考虑具有P> n的高维特征空间，ProSGPV算法也可以保持其良好的性能。我们提出了广泛的模拟和现实世界应用，将ProSGPV方法与平稳剪辑的绝对偏差（SCAD），自适应套索（AL）和Mini-Max凹面惩罚与惩罚的线性无偏置选择（MC+）进行了比较。尽管最后三种算法是当前选择可变选择的标准之一，但在某些情况下，ProSGPV具有出色的推理性能和可比的预测性能。补充材料可在线提供。

Many statistical methods have been proposed for variable selection in the past century, but few balance inference and prediction tasks well. Here we report on a novel variable selection approach called Penalized regression with Second-Generation P-Values (ProSGPV). It captures the true model at the best rate achieved by current standards, is easy to implement in practice, and often yields the smallest parameter estimation error. The idea is to use an l0 penalization scheme with second-generation p-values (SGPV), instead of traditional ones, to determine which variables remain in a model. The approach yields tangible advantages for balancing support recovery, parameter estimation, and prediction tasks. The ProSGPV algorithm can maintain its good performance even when there is strong collinearity among features or when a high dimensional feature space with p > n is considered. We present extensive simulations and a real-world application comparing the ProSGPV approach with smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and mini-max concave penalty with penalized linear unbiased selection (MC+). While the last three algorithms are among the current standards for variable selection, ProSGPV has superior inference performance and comparable prediction performance in certain scenarios. Supplementary materials are available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题