Frank-Wolfe优化深网

论文标题

Frank-Wolfe优化深网

Frank-Wolfe optimization for deep networks

论文作者

Stigenberg, Jakob

论文摘要

当今，深度神经网络是分类，回归和功能近似中最受欢迎的选择之一。但是，这种深层网络的培训远非微不足道，因为通常有数百万个参数可以调节。通常，人们使用一些希望收敛到一定最小值的优化方法。最受欢迎和成功的方法是基于梯度下降。在本文中，将另一种优化方法（Frank-Wolfe优化）应用于一个小的深网，并与梯度下降相比。尽管优化确实会收敛，但它确实如此缓慢，并且不接近梯度下降的速度。此外，在随机设置中，优化变得非常不稳定，并且除非使用线路搜索方法，否则似乎不会收敛。

Deep neural networks is today one of the most popular choices in classification, regression and function approximation. However, the training of such deep networks is far from trivial as there are often millions of parameters to tune. Typically, one use some optimization method that hopefully converges towards some minimum. The most popular and successful methods are based on gradient descent. In this paper, another optimization method, Frank-Wolfe optimization, is applied to a small deep network and compared to gradient descent. Although the optimization does converge, it does so slowly and not close to the speed of gradient descent. Further, in a stochastic setting, the optimization becomes very unstable and does not seem to converge unless one uses a line search approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题