论文标题
辍学训练的信息几何形状
Information Geometry of Dropout Training
论文作者
论文摘要
辍学是神经网络培训中最受欢迎的正规化技术之一。由于其力量和简单性,已经对辍学进行了广泛的分析,并提出了许多变体。在本文中,从信息几何学的角度以统一的方式讨论了辍学的几种属性。我们表明辍学使模型歧管变平,并且它们的正则化性能取决于曲率的量。然后,我们表明辍学基本上对应于依赖Fisher信息的正则化,并支持数值实验的结果。从不同的角度,对技术的这种理论分析有望极大地有助于理解仍处于起步阶段的神经网络。
Dropout is one of the most popular regularization techniques in neural network training. Because of its power and simplicity of idea, dropout has been analyzed extensively and many variants have been proposed. In this paper, several properties of dropout are discussed in a unified manner from the viewpoint of information geometry. We showed that dropout flattens the model manifold and that their regularization performance depends on the amount of the curvature. Then, we showed that dropout essentially corresponds to a regularization that depends on the Fisher information, and support this result from numerical experiments. Such a theoretical analysis of the technique from a different perspective is expected to greatly assist in the understanding of neural networks, which are still in their infancy.