论文标题
Hesscale:Hessian对角线的可扩展计算
HesScale: Scalable Computation of Hessian Diagonals
论文作者
论文摘要
二阶优化使用有关目标函数的曲率信息,这可以帮助更快地收敛。但是,这种方法通常需要对Hessian Matrix进行昂贵的计算,从而阻止其用法以可扩展的方式使用。缺乏有效的计算方式驱动了最广泛使用的方法,可以专注于未捕获曲率信息的一阶近似值。在本文中,我们开发了Hesscale,这是一种近似Hessian矩阵对角线的可扩展方法,以计算有效的方式合并二阶信息。我们表明,Hesscale具有与反向传播相同的计算复杂性。我们对监督分类的结果表明,Hesscale达到了高近似精度,从而实现了可扩展有效的二阶优化。
Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a scalable way. The absence of efficient ways of computation drove the most widely used methods to focus on first-order approximations that do not capture the curvature information. In this paper, we develop HesScale, a scalable approach to approximating the diagonal of the Hessian matrix, to incorporate second-order information in a computationally efficient manner. We show that HesScale has the same computational complexity as backpropagation. Our results on supervised classification show that HesScale achieves high approximation accuracy, allowing for scalable and efficient second-order optimization.