论文标题

关于使用交叉验证进行自适应拉索的校准

On the use of cross-validation for the calibration of the adaptive lasso

论文作者

Nadim, Ballout, Lola, Etievant, Vivian, Viallon

论文摘要

自适应套索是指使用$ L_1 $ -Norm惩罚的加权版本的一类方法,其权重得出了要估算的参数向量的初始估计。无论选择用于计算此初始估计的方法,自适应套索的性能都非常取决于超参数的值,后者控制了加权$ l_1 $ norm在受罚标准中的幅度。至于其他机器学习方法,交叉验证对于自适应拉索的校准非常流行,即选择该超参数的数据驱动的最佳值。但是,在这种情况下,最简单的交叉验证方案是无效的,必须采用更精细的跨验证方案来确保最佳校准。简单的交叉验证方案的差异已在其他情况下得到充分记录,但是在自适应套索的校准方面,较少的情况是,因此,许多统计分析师仍然忽略了它。在这项工作中,我们回想起适当的交叉验证方案,用于自适应套索的校准,并使用合成和现实世界实例来说明简单方案的差异。我们的结果清楚地确定了对自适应套索的几种版本(包括流行的一步套索)的支持恢复和预测错误的次优。

The adaptive lasso refers to a class of methods that use weighted versions of the $L_1$-norm penalty, with weights derived from an initial estimate of the parameter vector to be estimated. Irrespective of the method chosen to compute this initial estimate, the performance of the adaptive lasso critically depends on the value of a hyperparameter, which controls the magnitude of the weighted $L_1$-norm in the penalized criterion. As for other machine learning methods, cross-validation is very popular for the calibration of the adaptive lasso, that this for the selection of a data-driven optimal value of this hyperparameter. However, the most simple cross-validation scheme is not valid in this context, and a more elaborate one has to be employed to guarantee an optimal calibration. The discrepancy of the simple cross-validation scheme has been well documented in other contexts, but less so when it comes to the calibration of the adaptive lasso, and, therefore, many statistical analysts still overlook it. In this work, we recall appropriate cross-validation schemes for the calibration of the adaptive lasso, and illustrate the discrepancy of the simple scheme, using both synthetic and real-world examples. Our results clearly establish the suboptimality of the simple scheme, in terms of support recovery and prediction error, for several versions of the adaptive lasso, including the popular one-step lasso.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源