论文标题
弗兰克·沃尔夫(Frank-Wolfe)深度神经网络培训
Deep Neural Network Training with Frank-Wolfe
论文作者
论文摘要
本文研究了以条件梯度的形式使用无投射的一阶方法的经验疗效和好处,又称Frank-Wolfe方法,用于培训具有约束参数的神经网络。我们对当前最新的随机梯度下降方法以及随机条件梯度的不同变体进行了比较。特别是,我们显示了训练神经网络的一般可行性,其参数使用弗兰克 - 沃尔夫算法受到凸的可行区域的约束并比较不同的随机变体。然后,我们证明,通过选择一个合适的区域,可以实现超过无约束的随机梯度下降和匹配的最新结果的性能,该结果依赖于$ l^2 $ regularization。最后,我们还证明,除了影响绩效外,约束的特殊选择还可能对学习的表示形式产生巨大影响。
This paper studies the empirical efficacy and benefits of using projection-free first-order methods in the form of Conditional Gradients, a.k.a. Frank-Wolfe methods, for training Neural Networks with constrained parameters. We draw comparisons both to current state-of-the-art stochastic Gradient Descent methods as well as across different variants of stochastic Conditional Gradients. In particular, we show the general feasibility of training Neural Networks whose parameters are constrained by a convex feasible region using Frank-Wolfe algorithms and compare different stochastic variants. We then show that, by choosing an appropriate region, one can achieve performance exceeding that of unconstrained stochastic Gradient Descent and matching state-of-the-art results relying on $L^2$-regularization. Lastly, we also demonstrate that, besides impacting performance, the particular choice of constraints can have a drastic impact on the learned representations.