深层整流器网络中神经元死亡的概率范围

论文标题

深层整流器网络中神经元死亡的概率范围

Probabilistic bounds on neuron death in deep rectifier networks

论文作者

Rister, Blaine, Rubin, Daniel L.

论文摘要

神经元死亡是一种复杂的现象，对模型训练性有影响：网络越深，找到有效初始化的可能性就越低。在这项工作中，我们在将Relu网络初始化为可训练点的概率上，作为模型超级参数的函数。我们表明，只要宽度也增加，就可以无限期地增加网络的深度。此外，在合理的假设下，我们的边界是渐近紧密的：首先，上限与具有最大可能输入集的单层网络的真正概率相吻合。其次，随着输入集缩小到单个点，或当网络复杂性在对输出方差的假设下增长时，真实概率会收敛到我们的下限。我们通过数值模拟来确认这些结果，显示出随着网络深度的增加而快速收敛到下限。然后，在理论的启发下，我们提出了一个实用的符号翻转方案，该方案确保了$ k $ - 莱默网络中的生存数据点的比率至少为$ 2^{ - k} $。最后，我们展示了当前在实践中看到的网络设计功能（例如批处理标准式，剩余连接，密集的网络和跳过连接）如何减轻这些问题。这表明神经元死亡可能会深入了解各种模型架构的功效。

Neuron death is a complex phenomenon with implications for model trainability: the deeper the network, the lower the probability of finding a valid initialization. In this work, we derive both upper and lower bounds on the probability that a ReLU network is initialized to a trainable point, as a function of model hyperparameters. We show that it is possible to increase the depth of a network indefinitely, so long as the width increases as well. Furthermore, our bounds are asymptotically tight under reasonable assumptions: first, the upper bound coincides with the true probability for a single-layer network with the largest possible input set. Second, the true probability converges to our lower bound as the input set shrinks to a single point, or as the network complexity grows under an assumption about the output variance. We confirm these results by numerical simulation, showing rapid convergence to the lower bound with increasing network depth. Then, motivated by the theory, we propose a practical sign flipping scheme which guarantees that the ratio of living data points in a $k$-layer network is at least $2^{-k}$. Finally, we show how these issues are mitigated by network design features currently seen in practice, such as batch normalization, residual connections, dense networks and skip connections. This suggests that neuron death may provide insight into the efficacy of various model architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题