通过活动的稀疏性和稀疏的后传播进行有效的复发性架构

论文标题

通过活动的稀疏性和稀疏的后传播进行有效的复发性架构

Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

论文作者

Subramoney, Anand, Nazeer, Khaleelulla Khan, Schöne, Mark, Mayr, Christian, Kappel, David

论文摘要

复发性神经网络（RNN）非常适合在资源受限系统中求解序列任务，因为它们的表达性和较低的计算要求。但是，仍然需要弥合RNN在效率和性能和现实世界应用要求方面的能力之间的差距。在每个时间步骤中传播所有神经元的激活以及每个连接的神经元的激活以及激活的顺序依赖性，导致训练和使用RNN的效率低下。我们提出了一种受生物神经元动力学启发的解决方案，使RNN单元之间的通信稀疏和离散。这使得向后通过时间（BPTT）计算稀疏和高效的反向传播。我们将模型基于封闭式复发单元（GRU），并以阈值触发的通信发射离散事件的单位将其扩展，以便在没有事件的情况下不会将信息传达给其他单位。从理论上讲，我们表明单元之间的通信，因此向前和向后传递所需的计算以及网络中事件的数量。我们的模型在不损害任务绩效的情况下实现了效率，与现实世界任务（包括语言建模）中最新的经常性网络模型相比，证明了竞争性能。动态活动稀疏机制还使我们的模型非常适合新型节能神经形态硬件。代码可从https://github.com/khaleelkhan/evnn/获得。

Recurrent neural networks (RNNs) are well suited for solving sequence tasks in resource-constrained systems due to their expressivity and low computational requirements. However, there is still a need to bridge the gap between what RNNs are capable of in terms of efficiency and performance and real-world application requirements. The memory and computational requirements arising from propagating the activations of all the neurons at every time step to every connected neuron, together with the sequential dependence of activations, contribute to the inefficiency of training and using RNNs. We propose a solution inspired by biological neuron dynamics that makes the communication between RNN units sparse and discrete. This makes the backward pass with backpropagation through time (BPTT) computationally sparse and efficient as well. We base our model on the gated recurrent unit (GRU), extending it with units that emit discrete events for communication triggered by a threshold so that no information is communicated to other units in the absence of events. We show theoretically that the communication between units, and hence the computation required for both the forward and backward passes, scales with the number of events in the network. Our model achieves efficiency without compromising task performance, demonstrating competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling. The dynamic activity sparsity mechanism also makes our model well suited for novel energy-efficient neuromorphic hardware. Code is available at https://github.com/KhaleelKhan/EvNN/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题