铰链损失分类的浅网络动力学的分析理论

论文标题

铰链损失分类的浅网络动力学的分析理论

An analytic theory of shallow networks dynamics for hinge loss classification

论文作者

Pellegrini, Franco, Biroli, Giulio

论文摘要

神经网络已被证明在结构化的高维数据集上的分类任务中表现出色。但是，此类网络的学习动力仍然很少了解。在本文中，我们详细研究了一种简单的神经网络类型的训练动力学：一个训练执行分类任务的单个隐藏层。我们表明，在合适的平均场上限制了此情况将其映射到单个节点学习问题，而时间依赖的数据集则是从平均节点群体中自一确定的。我们将理论专门针对线性可分离数据集的原型情况和线性铰链损失，可以明确求解动力学。这使我们可以在简单的环境中解决现代网络中出现的几种现象，例如减慢训练动态，丰富和懒惰学习之间的交叉以及过度拟合。最后，我们通过研究大量但有限的节点和训练样本的情况来评估平均场理论的局限性。

Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss, for which the dynamics can be explicitly solved. This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we asses the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题