论文标题
通过复发神经网络学习长时间尺度的快速饱和门
Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks
论文作者
论文摘要
诸如LSTM和GRU之类的复发模型中的栅极功能在通过使用有界激活函数的时间序列数据中学习各种时间尺度中起着核心作用。但是,由于大型输入的有限函数的梯度消失,很难训练大门捕获极长的尺度,这被称为饱和问题。我们仔细分析了栅极功能饱和与训练效率之间的关系。我们证明,可以通过加速饱和函数的收敛性来减轻栅极函数的梯度消失,即使该函数的输出收敛到0或1的速度更快。基于分析结果,我们提出了一个名为“快速门”的门函数,该函数与简单函数组成相对于输入具有双重指数收敛速率。我们从经验上表明,我们的方法在涉及极长尺度的基准任务上的准确性和计算效率方面优于先前的方法。
Gate functions in recurrent models, such as an LSTM and GRU, play a central role in learning various time scales in modeling time series data by using a bounded activation function. However, it is difficult to train gates to capture extremely long time scales due to gradient vanishing of the bounded function for large inputs, which is known as the saturation problem. We closely analyze the relation between saturation of the gate function and efficiency of the training. We prove that the gradient vanishing of the gate function can be mitigated by accelerating the convergence of the saturating function, i.e., making the output of the function converge to 0 or 1 faster. Based on the analysis results, we propose a gate function called fast gate that has a doubly exponential convergence rate with respect to inputs by simple function composition. We empirically show that our method outperforms previous methods in accuracy and computational efficiency on benchmark tasks involving extremely long time scales.