随机梯度下降方法的严格动态平均场理论

论文标题

随机梯度下降方法的严格动态平均场理论

Rigorous dynamical mean field theory for stochastic gradient descent methods

论文作者

Gerbelot, Cedric, Troiani, Emanuele, Mignacco, Francesca, Krzakala, Florent, Zdeborova, Lenka

论文摘要

我们证明了基于一阶梯度方法的家族的精确高维渐近学的封闭式方程，从对高斯数据的观察结果（例如M-估计剂，浅神经网络，浅网络，...）学习具有经验风险最小化的观察结果。这包括广泛使用的算法，例如随机梯度下降（SGD）或Nesterov加速度。所获得的方程与统计物理学的动态平均场理论（DMFT）方程的离散化相匹配时，将其应用于梯度流。我们的证明方法使我们能够明确描述内存内核如何在有效的动态中堆积，并包含非独一更新功能，从而允许具有非身份协方差矩阵的数据集。最后，我们提供了具有通用的大批量大小和持续学习率的SGD方程的数值实现。

We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题