软件级的精度，使用带电陷阱的重量矩阵使用随机计算

论文标题

软件级的精度，使用带电陷阱的重量矩阵使用随机计算

Software-Level Accuracy Using Stochastic Computing With Charge-Trap-Flash Based Weight Matrix

论文作者

Bhatt, Varun, Shrivastava, Shalini, Chavan, Tanmay, Ganguly, Udayan

论文摘要

最近，具有新兴记忆设备的内存计算范式已被证明是加速深度学习的一种有希望的方法。已经提出了电阻处理单元（RPU），以便使用随机脉冲的随机列车在横杆阵列中启用矢量矢量外产物，以实现单球重量更新，有望在矩阵乘法操作中加速强烈加速，从而形成了大部分训练神经网络。但是，如果设备不满足线性电导的状况，则系统的性能会受到影响。这是纳米级记忆的挑战。最近，充电陷阱闪光灯（CTF）内存显示在饱和之前具有大量级别，但可变的非线性性。在本文中，我们探讨了电导变化和线性范围之间的权衡。我们通过模拟显示，在最佳选择范围内，我们的系统的性能几乎和使用精确的浮点操作训练的模型一样，性能降低了1％。我们的系统在MNIST数据集上的准确度为97.9％，CIFAR-10和CIFAR-100数据集的精度为89.1％和70.5％（使用预提取功能）。我们还展示了它在加固学习中的用途，在Q学习中，它用于进行价值函数近似，并学会在大约146个步骤中完成一集山车控制问题。基于CTF的RPU以最新的为基准，在类表现上表现出最佳的表现，以实现软件等效性能。

The in-memory computing paradigm with emerging memory devices has been recently shown to be a promising way to accelerate deep learning. Resistive processing unit (RPU) has been proposed to enable the vector-vector outer product in a crossbar array using a stochastic train of identical pulses to enable one-shot weight update, promising intense speed-up in matrix multiplication operations, which form the bulk of training neural networks. However, the performance of the system suffers if the device does not satisfy the condition of linear conductance change over around 1,000 conductance levels. This is a challenge for nanoscale memories. Recently, Charge Trap Flash (CTF) memory was shown to have a large number of levels before saturation, but variable non-linearity. In this paper, we explore the trade-off between the range of conductance change and linearity. We show, through simulations, that at an optimum choice of the range, our system performs nearly as well as the models trained using exact floating point operations, with less than 1% reduction in the performance. Our system reaches an accuracy of 97.9% on MNIST dataset, 89.1% and 70.5% accuracy on CIFAR-10 and CIFAR-100 datasets (using pre-extracted features). We also show its use in reinforcement learning, where it is used for value function approximation in Q-Learning, and learns to complete an episode the mountain car control problem in around 146 steps. Benchmarked to state-of-the-art, the CTF based RPU shows best in class performance to enable software equivalent performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题