论文标题
Spr:解决SGD的$ L_1 $罚款
spred: Solving $L_1$ Penalty with SGD
论文作者
论文摘要
我们建议使用简单的重新射击和直接的随机梯度下降来最大程度地减少使用$ L_1 $约束的通用可区分目标。我们的建议是对以前的想法的直接概括,即$ L_1 $罚款可能等效于重量衰减的可区分重新质量化。 We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression以前尝试应用$ L_1 $ - 佩纳蒂的任务是概念上不成功的,我们的结果弥合了深度学习和常规统计学习中的稀疏之间的差距。
We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.