在复发性神经网络中的记忆诅咒中：近似和优化分析

论文标题

在复发性神经网络中的记忆诅咒中：近似和优化分析

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

论文作者

Li, Zhong, Han, Jiequn, E, Weinan, Li, Qianxiao

论文摘要

当应用于时间数据中的输入输出关系时，我们研究了复发性神经网络（RNN）的近似属性和优化动力学。我们考虑使用连续时间线性RNN从线性关系生成的数据中学习的简单但代表性的设置。从数学上讲，后者可以理解为线性函数的序列。我们证明了这种线性函数的通用近似定理，并表征了近似率及其与内存的关系。此外，我们对训练线性RNN进行了细粒度的动态分析，这进一步揭示了记忆与学习之间的复杂相互作用。发现的一个统一主题是记忆的非平凡效果，这是在我们的框架中可以精确的近似和优化的概念：当目标中有长期内存时，需要大量神经元才能近似它。此外，训练过程将遭受缓慢的影响。特别是，这两种效果都在记忆中变得更为明显 - 我们称之为“记忆的诅咒”。这些分析是迈向对新现象的具体数学理解的基本步骤，这在使用经常性体系结构学习时间关系时可能出现。

We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题