全新的K-Facs：通过在线分解更新加速K-FAC

论文标题

全新的K-Facs：通过在线分解更新加速K-FAC

Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates

论文作者

Puiu, Constantin Octavian

论文摘要

K-FAC（ARXIV：1503.05671，ARXIV：1602.01407）是对深度学习（DL）的自然梯度（NG）的拖延实现，其瓶颈是计算所谓的``kronecker-factors''（K-firactors'（K-Factors）的倒置。 RS-KFAC（ARXIV：2206.15397）是K-FAC的改进，可提供廉价的估计K因子倒置的方式。在本文中，我们利用了K因子的指数平均构造范式，并使用在线数值线性代数技术提出了更便宜（但不太准确）的估计K因子倒置的方式。特别是，我们提出了一个K因子反更新，该更新以层大小线性缩放。我们还提出了一个逆应用程序，该过程也可以线性缩放（k-fac的一个立方体尺度和RS-kFAC尺度的尺度四）。总体而言，我们提出的算法给出了近似的k-fac实现，其预处理的零件尺寸线性尺寸（与k-fac的Cutic相比，RS-KFAC的二次）。但是，与RS-KFAC方法不同（ARXIV：2206.15397）不同，此更新仅适用于某些情况（通常用于所有FC层）。数值结果表明，通过将我们的建议更新添加到它，可以通过最小的CPU开销来减少RS-KFAC的反转误差。根据提出的程序，对其进行校正和RS-KFAC，我们提出了三种实用算法，以优化通用的深神经网。数值结果表明，对于CIFAR10分类的任何目标测试准确性，其中两个均优于RS-KFAC，并具有稍微修改的VGG16_BN。我们提出的算法比SENG（DL的经验NG实施状态; ARXIV：2006.05924）更快地实现了91 $ \％$测试的准确性，但表现不佳，无法获得更高的测试准确性。

K-FAC (arXiv:1503.05671, arXiv:1602.01407) is a tractable implementation of Natural Gradient (NG) for Deep Learning (DL), whose bottleneck is computing the inverses of the so-called ``Kronecker-Factors'' (K-factors). RS-KFAC (arXiv:2206.15397) is a K-FAC improvement which provides a cheap way of estimating the K-factors inverses. In this paper, we exploit the exponential-average construction paradigm of the K-factors, and use online numerical linear algebra techniques to propose an even cheaper (but less accurate) way of estimating the K-factors inverses. In particular, we propose a K-factor inverse update which scales linearly in layer size. We also propose an inverse application procedure which scales linearly as well (the one of K-FAC scales cubically and the one of RS-KFAC scales quadratically). Overall, our proposed algorithm gives an approximate K-FAC implementation whose preconditioning part scales linearly in layer size (compare to cubic for K-FAC and quadratic for RS-KFAC). Importantly however, this update is only applicable in some circumstances (typically for all FC layers), unlike the RS-KFAC approach (arXiv:2206.15397). Numerical results show RS-KFAC's inversion error can be reduced with minimal CPU overhead by adding our proposed update to it. Based on the proposed procedure, a correction to it, and RS-KFAC, we propose three practical algorithms for optimizing generic Deep Neural Nets. Numerical results show that two of these outperform RS-KFAC for any target test accuracy on CIFAR10 classification with a slightly modified version of VGG16_bn. Our proposed algorithms achieve 91$\%$ test accuracy faster than SENG (the state of art implementation of empirical NG for DL; arXiv:2006.05924) but underperform it for higher test-accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题