通过耦合初始化来界定神经网络的宽度 - 最坏的情况分析

论文标题

通过耦合初始化来界定神经网络的宽度 - 最坏的情况分析

Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

论文作者

Munteanu, Alexander, Omlor, Simon, Song, Zhao, Woodruff, David P.

论文摘要

训练神经网络的一种常见方法是将所有权重初始化为独立的高斯向量。我们观察到，通过将权重初始化为独立对，每对由两个相同的高斯向量组成，我们可以显着改善收敛分析。虽然已经研究了类似的技术用于随机输入[Daniely，Neurips 2020]，但尚未使用任意输入对其进行分析。 Using this technique, we show how to significantly reduce the number of neurons required for two-layer ReLU networks, both in the under-parameterized setting with logistic loss, from roughly $γ^{-8}$ [Ji and Telgarsky, ICLR 2020] to $γ^{-2}$, where $γ$ denotes the separation margin with a Neural Tangent Kernel, as well as in the从大约$ n^4 $ [Song and Yang，2019年]到$ n^2 $的过度参数化设置，隐含地改善了[Brand，Peng，Song和Weinstein，ITCS 2021]的近期运行时间。对于参数不足的设置，我们还证明了在先前工作时改善的新下限，并且在某些假设下是最好的。

A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. While a similar technique has been studied for random inputs [Daniely, NeurIPS 2020], it has not been analyzed with arbitrary inputs. Using this technique, we show how to significantly reduce the number of neurons required for two-layer ReLU networks, both in the under-parameterized setting with logistic loss, from roughly $γ^{-8}$ [Ji and Telgarsky, ICLR 2020] to $γ^{-2}$, where $γ$ denotes the separation margin with a Neural Tangent Kernel, as well as in the over-parameterized setting with squared loss, from roughly $n^4$ [Song and Yang, 2019] to $n^2$, implicitly also improving the recent running time bound of [Brand, Peng, Song and Weinstein, ITCS 2021]. For the under-parameterized setting we also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题