论文标题

全新的K-Facs:通过在线分解更新加速K-FAC

Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates

论文作者

Puiu, Constantin Octavian

论文摘要

K-FAC(ARXIV:1503.05671,ARXIV:1602.01407)是对深度学习(DL)的自然梯度(NG)的拖延实现,其瓶颈是计算所谓的``kronecker-factors''(K-firactors'(K-Factors)的倒置。 RS-KFAC(ARXIV:2206.15397)是K-FAC的改进,可提供廉价的估计K因子倒置的方式。 在本文中,我们利用了K因子的指数平均构造范式,并使用在线数值线性代数技术提出了更便宜(但不太准确)的估计K因子倒置的方式。特别是,我们提出了一个K因子反更新,该更新以层大小线性缩放。我们还提出了一个逆应用程序,该过程也可以线性缩放(k-fac的一个立方体尺度和RS-kFAC尺度的尺度四)。总体而言,我们提出的算法给出了近似的k-fac实现,其预处理的零件尺寸线性尺寸(与k-fac的Cutic相比,RS-KFAC的二次)。但是,与RS-KFAC方法不同(ARXIV:2206.15397)不同,此更新仅适用于某些情况(通常用于所有FC层)。 数值结果表明,通过将我们的建议更新添加到它,可以通过最小的CPU开销来减少RS-KFAC的反转误差。根据提出的程序,对其进行校正和RS-KFAC,我们提出了三种实用算法,以优化通用的深神经网。数值结果表明,对于CIFAR10分类的任何目标测试准确性,其中两个均优于RS-KFAC,并具有稍微修改的VGG16_BN。我们提出的算法比SENG(DL的经验NG实施状态; ARXIV:2006.05924)更快地实现了91 $ \%$测试的准确性,但表现不佳,无法获得更高的测试准确性。

K-FAC (arXiv:1503.05671, arXiv:1602.01407) is a tractable implementation of Natural Gradient (NG) for Deep Learning (DL), whose bottleneck is computing the inverses of the so-called ``Kronecker-Factors'' (K-factors). RS-KFAC (arXiv:2206.15397) is a K-FAC improvement which provides a cheap way of estimating the K-factors inverses. In this paper, we exploit the exponential-average construction paradigm of the K-factors, and use online numerical linear algebra techniques to propose an even cheaper (but less accurate) way of estimating the K-factors inverses. In particular, we propose a K-factor inverse update which scales linearly in layer size. We also propose an inverse application procedure which scales linearly as well (the one of K-FAC scales cubically and the one of RS-KFAC scales quadratically). Overall, our proposed algorithm gives an approximate K-FAC implementation whose preconditioning part scales linearly in layer size (compare to cubic for K-FAC and quadratic for RS-KFAC). Importantly however, this update is only applicable in some circumstances (typically for all FC layers), unlike the RS-KFAC approach (arXiv:2206.15397). Numerical results show RS-KFAC's inversion error can be reduced with minimal CPU overhead by adding our proposed update to it. Based on the proposed procedure, a correction to it, and RS-KFAC, we propose three practical algorithms for optimizing generic Deep Neural Nets. Numerical results show that two of these outperform RS-KFAC for any target test accuracy on CIFAR10 classification with a slightly modified version of VGG16_bn. Our proposed algorithms achieve 91$\%$ test accuracy faster than SENG (the state of art implementation of empirical NG for DL; arXiv:2006.05924) but underperform it for higher test-accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源