两层神经网络中二阶动力学的全球收敛

论文标题

两层神经网络中二阶动力学的全球收敛

Global Convergence of Second-order Dynamics in Two-layer Neural Networks

论文作者

Krichene, Walid, Caluya, Kenneth F., Halder, Abhishek

论文摘要

最近的结果表明，对于两层完全连接的神经网络，通过在平均场动力学与Wasserstein梯度流之间建立连接，梯度流在无限宽度极限中收敛到全局最佳。这些结果是针对一阶梯度流得出的，一个自然的问题是二阶动力学（即具有动量的动力学）是否表现出相似的保证。我们证明答案对重球方法是肯定的。在这种情况下，所得的integro-pde是一种非线性动力学福克planck方程，与一阶情况不同，它与Wasserstein梯度流程没有明显的联系。取而代之的是，我们研究沿溶液轨迹的Lyapunov功能的变化，以表征固定点并证明收敛。尽管我们的结果在平均场限制中是渐近的，但数值模拟表明，对于合理的网络，可能已经发生了全局收敛。

Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit, by making a connection between the mean field dynamics and the Wasserstein gradient flow. These results were derived for first-order gradient flow, and a natural question is whether second-order dynamics, i.e., dynamics with momentum, exhibit a similar guarantee. We show that the answer is positive for the heavy ball method. In this case, the resulting integro-PDE is a nonlinear kinetic Fokker Planck equation, and unlike the first-order case, it has no apparent connection with the Wasserstein gradient flow. Instead, we study the variations of a Lyapunov functional along the solution trajectories to characterize the stationary points and to prove convergence. While our results are asymptotic in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题