组成部分天然梯度下降 - 有效的神经网络优化

论文标题

组成部分天然梯度下降 - 有效的神经网络优化

Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization

论文作者

Van Sang, Tran, Irvan, Mhd, Yamaguchi, Rie Shigetomi, Nakata, Toshiyuki

论文摘要

天然梯度下降（NGD）是一种二阶神经网络训练，它以Fisher Information Matrix（FIM）的倒数为先进的梯度下降。尽管NGD提供了有效的预处理，但由于FIM时需要昂贵的计算，因此它是不可行的。本文提出了一种新的NGD变体算法，称为组成部分的天然梯度下降（CW-NGD）。 CW-NGD由2个步骤组成。与几项现有作品类似，第一步是将FIM矩阵视为块对角矩阵，其对角线块对应于每层重量的FIM。在第二步中，CW-NGD独有的内容，我们分析了层的结构，并将层的FIM进一步分解为较小的段，其衍生物是近似独立的。结果，单个层的FIM近似于块形式形式，该形式在微不足道上支持反转。片段分解策略通过层结构而变化。具体而言，我们分析了密集和卷积层，并适当设计其分解策略。在训练包含这两种类型层的网络的实验中，我们从经验上证明，与最先进的一阶和二阶方法相比，CW-NGD需要更少的迭代。

Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Descent (CW-NGD). CW-NGD is composed of 2 steps. Similar to several existing works, the first step is to consider the FIM matrix as a block-diagonal matrix whose diagonal blocks correspond to the FIM of each layer's weights. In the second step, unique to CW-NGD, we analyze the layer's structure and further decompose the layer's FIM into smaller segments whose derivatives are approximately independent. As a result, individual layers' FIMs are approximated in a block-diagonal form that trivially supports the inversion. The segment decomposition strategy is varied by layer structure. Specifically, we analyze the dense and convolutional layers and design their decomposition strategies appropriately. In an experiment of training a network containing these 2 types of layers, we empirically prove that CW-NGD requires fewer iterations to converge compared to the state-of-the-art first-order and second-order methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题