关于深度学习的非大学性：量化对称性的成本

论文标题

关于深度学习的非大学性：量化对称性的成本

On the non-universality of deep learning: quantifying the cost of symmetry

论文作者

Abbe, Emmanuel, Boix-Adsera, Enric

论文摘要

我们证明了对嘈杂梯度下降（GD）训练的神经网络可以有效学习的局限性。每当GD培训是模棱两可的情况下，我们的结果适用，该培训适用于许多标准架构和初始化。作为应用程序，（i）我们表征了完全连接的网络可以在二进制HyperCube和单位球体上弱的功能，这表明Depth-2与此任务的任何其他深度一样强大；（ii）我们将与潜在的低维结构[ABM22]学习的合并楼梯必需结果扩展到均值野外状态之外。在加密假设下，我们还通过通过随机梯度下降（SGD）训练的完全连接的网络显示了学习的硬度结果。

We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the mean-field regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).

下载PDF全文

下载文献需遵守相关版权规定

论文标题