论文标题
在复发性神经网络中的记忆诅咒中:近似和优化分析
On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis
论文作者
论文摘要
当应用于时间数据中的输入输出关系时,我们研究了复发性神经网络(RNN)的近似属性和优化动力学。我们考虑使用连续时间线性RNN从线性关系生成的数据中学习的简单但代表性的设置。从数学上讲,后者可以理解为线性函数的序列。我们证明了这种线性函数的通用近似定理,并表征了近似率及其与内存的关系。此外,我们对训练线性RNN进行了细粒度的动态分析,这进一步揭示了记忆与学习之间的复杂相互作用。发现的一个统一主题是记忆的非平凡效果,这是在我们的框架中可以精确的近似和优化的概念:当目标中有长期内存时,需要大量神经元才能近似它。此外,训练过程将遭受缓慢的影响。特别是,这两种效果都在记忆中变得更为明显 - 我们称之为“记忆的诅咒”。这些分析是迈向对新现象的具体数学理解的基本步骤,这在使用经常性体系结构学习时间关系时可能出现。
We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures.