论文标题

从Pac-Bayesian理论的角度分析彩票假设

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

论文作者

Sakamoto, Keitaro, Sato, Issei

论文摘要

彩票票证假设(LTH)引起了人们的关注,因为它可以解释为什么过度参数化模型通常显示出高概括能力。众所周知,当我们使用迭代幅度修剪(IMP)时,这是一种算法,可以找到具有高概括能力的稀疏网络,可以独立地从初始权重训练,称为获胜门票,最初的大型学习率在Resnet等深层神经网络中无法很好地工作。但是,由于最初的较大学习率通常有助于优化器将其收敛到平坦的最小值,因此我们假设获胜的门票具有相对鲜明的最小值,这在概括能力方面被认为是不利的。在本文中,我们证实了这一假设,并表明Pac-Bayesian理论可以对LTH与概括行为之间的关系有明确的理解。根据我们的实验发现,即平坦度有助于提高标记噪声的准确性和稳健性,并且与初始权重的距离深深涉及获胜门票,我们提供了使用钉钉和套件分配的Pac-Bayes,以分析获胜的门票。最后,我们重新访问了现有的算法,以从Pac-Bayesian的角度查找获胜门票,并对这些方法提供新的见解。

The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源