分析有限的神经网络：我们可以信任神经切线内核理论吗？

论文标题

分析有限的神经网络：我们可以信任神经切线内核理论吗？

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

论文作者

Seleznova, Mariia, Kutyniok, Gitta

论文摘要

神经切线内核（NTK）理论被广泛用于研究梯度下降下无限宽的深神经网络（DNN）的动力学。但是，无限范围的网络的结果是否给我们提示真正有限宽度宽度的行为吗？在本文中，我们以经验研究NTK理论在实践中对完全连接的relu和Sigmoid DNN有效。我们发现网络是否在NTK制度中取决于随机初始化的超参数和网络的深度。特别是，NTK理论不能解释初始化的足够深网络的行为，因此它们的梯度在网络层传播时会爆炸：与NTK理论相反，在本案例中，内核是随机的，并且在训练期间发生了显着变化。另一方面，在消失的梯度的情况下，DNN处于NTK状态，但随着深度而无法迅速实现。我们还通过NTK理论描述了研究DNN的概括属性的框架，特别是网络输出函数的差异并讨论其限制。

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent. But do the results for infinitely-wide networks give us hints about the behavior of real finite-width ones? In this paper, we study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs. We find out that whether a network is in the NTK regime depends on the hyperparameters of random initialization and the network's depth. In particular, NTK theory does not explain the behavior of sufficiently deep networks initialized so that their gradients explode as they propagate through the network's layers: the kernel is random at initialization and changes significantly during training in this case, contrary to NTK theory. On the other hand, in the case of vanishing gradients, DNNs are in the the NTK regime but become untrainable rapidly with depth. We also describe a framework to study generalization properties of DNNs, in particular the variance of network's output function, by means of NTK theory and discuss its limits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题