p-admmirnn：通过高效且平行的ADMM方法训练RNN具有稳定的收敛性

论文标题

p-admmirnn：通过高效且平行的ADMM方法训练RNN具有稳定的收敛性

P-ADMMiRNN: Training RNN with Stable Convergence via An Efficient and Paralleled ADMM Approach

论文作者

Tang, Yu, Kan, Zhigang, Sun, Dequan, Xiao, Jingjing, Lai, Zhiquan, Qiao, Linbo, Li, Dongsheng

论文摘要

很难以稳定的收敛训练复发性神经网络（RNN），并避免梯度消失和爆炸问题，因为复发单元中的重量从迭代到迭代重复。此外，RNN对重量和偏见的初始化很敏感，这在训练中带来了困难。乘数的交替方向方法（ADMM）已成为一种有前途的算法，该算法是训练传统随机梯度算法以无梯度特征和对不满意条件的免疫力的传统随机梯度算法的有希望的算法。但是，由于经常性单元中的状态在时间段上重复更新，因此无法将ADMM直接应用于培训RNN。因此，这项工作建立了一个新的框架，即RNN展开的形式，以同时解决上述挑战。我们还提供新颖的更新规则和理论融合分析。我们在ADMMIRNN的迭代中明确指定了基本的更新规则，该规则采用构造的近似技术和解决方案，而不是为每个子问题而不是香草ADMM。数值实验是在MNIST，IMDB和文本分类任务上进行的。 Admmirnn取得了收敛效果，并胜过比较基线的结果。此外，与随机梯度算法相比，Admmirnn在没有梯度消失或爆炸的情况下更稳定地训练RNN。我们还提供了有关admmirnn的分布式并行算法，名为p-admmirnn，包括同步平行的admmirnn（sp-admmirnn）和异步平行的admmirnn（ap-admmirnn），这是第一个在AssynChranclare sallanchranclalous sallanchrancelous sallchrancelos的方式中训练RNN的人。源代码可公开可用。

It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding problems, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulties in training. The Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms with the gradient-free features and immunity to unsatisfactory conditions. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously. We also provide novel update rules and theoretical convergence analysis. We explicitly specify essential update rules in the iterations of ADMMiRNN with constructed approximation techniques and solutions to each sub-problem instead of vanilla ADMM. Numerical experiments are conducted on MNIST, IMDb, and text classification tasks. ADMMiRNN achieves convergent results and outperforms the compared baselines. Furthermore, ADMMiRNN trains RNN more stably without gradient vanishing or exploding than stochastic gradient algorithms. We also provide a distributed paralleled algorithm regarding ADMMiRNN, named P-ADMMiRNN, including Synchronous Parallel ADMMiRNN (SP-ADMMiRNN) and Asynchronous Parallel ADMMiRNN (AP-ADMMiRNN), which is the first to train RNN with ADMM in an asynchronous parallel manner. The source code is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题