批次归一化证明避免了随机初始化的深网的等级崩溃

论文标题

批次归一化证明避免了随机初始化的深网的等级崩溃

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

论文作者

Daneshmand, Hadi, Kohler, Jonas, Bach, Francis, Hofmann, Thomas, Lucchi, Aurelien

论文摘要

众所周知，随机初始化的神经网络会变得更难随着深度而训练，除非使用诸如残留连接和批处理归一化之类的建筑增强功能。我们在这里通过重新审查深网中随机初始化与随机矩阵产物中光谱不稳定性之间的连接来研究这种现象。鉴于关于随机矩阵的丰富文献，毫不奇怪的是，发现非归一化网络中的中间表示的等级随着深度迅速崩溃。在这项工作中，我们强调了一个事实，即批处理是避免线性和relu网络均崩溃的有效策略。利用马尔可夫链理论的工具，我们得出了一个有意义的较低等级，以深层线性网络结合。从经验上讲，我们还证明了这种稳定性概括为恢复网。最后，我们对现实世界数据集进行了广泛的实验，这些实验证实，等级稳定性确实是训练现代深神经体系结构的关键条件。

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets. Finally, we conduct an extensive set of experiments on real-world data sets, which confirm that rank stability is indeed a crucial condition for training modern-day deep neural architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题