论文标题
在深度学习中的幂律黑森州频谱上
On the Power-Law Hessian Spectrums in Deep Learning
论文作者
论文摘要
众所周知,深度损失景观的黑森州对深度学习的优化,概括甚至稳健性至关重要。最近的著作从经验上发现,深度学习中的Hessian Spectrum具有两个组成的结构,该结构由少数大型特征值和大量近零特征值组成。但是,Hessian频谱背后的理论机制或数学基本上仍未探索。据我们所知,我们是第一个证明训练有素的深神经网络的Hessian频谱表现出简单的幂律结构。受统计物理理论和天然蛋白质频谱分析的启发,我们提供了一种最大的内部理论解释,以解释幂律结构存在的原因,并提出蛋白质演化和深神经网络训练之间的光谱平行。通过有助于广泛的实验,我们进一步使用幂律频谱框架作为探索深度学习多种新型行为的有用工具。
It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of deep neural networks. By conducing extensive experiments, we further use the power-law spectral framework as a useful tool to explore multiple novel behaviors of deep learning.