在残留架构的随机内核上

论文标题

在残留架构的随机内核上

On Random Kernels of Residual Architectures

论文作者

Littwin, Etai, Galanti, Tomer, Wolf, Lior

论文摘要

我们得出了重置和登录的神经切线核（NTK）的有限宽度和深度校正。我们的分析表明，有限尺寸的残留体系结构的初始化得多要靠近“内核式”，而不是其香草对应物：在不使用跳过连接的网络中，与NTK的收敛需要固定深度，同时增加层的宽度。我们的发现表明，在重新结构中，当深度和宽度同时倾向于无穷大，并提供适当的初始化时，可能会收敛到NTK。然而，在密烯烯中，随着宽度倾向于无穷大的宽度，NTK的收敛性达到了极限，该速度与重量的深度和规模无关。我们的实验验证了理论结果，并证明了具有随机梯度特征的内核回归的深度重新NET和登录的优势。

We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our analysis reveals that finite size residual architectures are initialized much closer to the "kernel regime" than their vanilla counterparts: while in networks that do not use skip connections, convergence to the NTK requires one to fix the depth, while increasing the layers' width. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity, provided with a proper initialization. In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed, at a rate that is independent of both the depth and scale of the weights. Our experiments validate the theoretical results and demonstrate the advantage of deep ResNets and DenseNets for kernel regression with random gradient features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题