论文标题
MATPIM:具有回忆性状态逻辑的加速矩阵操作
MatPIM: Accelerating Matrix Operations with Memristive Stateful Logic
论文作者
论文摘要
新兴的回忆存储器处理单元(MMPU)通过记忆墙通过回忆设备,将存储和逻辑团结起来,以实现内存(PIM)系统。 MMPU的核心是有状态的逻辑,该逻辑是通过回忆分区加速的,可以在横杆阵列中具有较大的固有并行性启用逻辑。本文大大加快了MMPU中矩阵矢量乘法和卷积的基本操作,并具有完整的或二元元素。这些提出的算法为大规模MMPU应用(例如神经网络,图像处理和数值方法)建立了有效的基础。我们通过利用块矩阵乘法和还原的技术来克服先前内存完整矩阵矢量乘法解决方案中固有的不对称限制。我们通过利用基于树的爆炸式弹出式减少(比以前的工作快39倍)来介绍第一个快速的内存二进制矩阵矢量乘法算法。对于卷积,我们提出了一种新型的内存输入 - 平行概念,我们用于全精度算法,该算法克服了卷积的不对称限制,同时也提高了延迟(比以前的工作快2倍),并且是第一个快速二进制二进制算法(比以前的工作更快)。
The emerging memristive Memory Processing Unit (mMPU) overcomes the memory wall through memristive devices that unite storage and logic for real processing-in-memory (PIM) systems. At the core of the mMPU is stateful logic, which is accelerated with memristive partitions to enable logic with massive inherent parallelism within crossbar arrays. This paper vastly accelerates the fundamental operations of matrix-vector multiplication and convolution in the mMPU, with either full-precision or binary elements. These proposed algorithms establish an efficient foundation for large-scale mMPU applications such as neural-networks, image processing, and numerical methods. We overcome the inherent asymmetry limitation in the previous in-memory full-precision matrix-vector multiplication solutions by utilizing techniques from block matrix multiplication and reduction. We present the first fast in-memory binary matrix-vector multiplication algorithm by utilizing memristive partitions with a tree-based popcount reduction (39x faster than previous work). For convolution, we present a novel in-memory input-parallel concept which we utilize for a full-precision algorithm that overcomes the asymmetry limitation in convolution, while also improving latency (2x faster than previous work), and the first fast binary algorithm (12x faster than previous work).