论文标题
CNN加速器的比特线计算在Edge AI推理中共同设计
Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
论文作者
论文摘要
通过支持同时访问多个内存单词,比特线计算(BC)架构允许在内存中并行执行位智能操作。在阵列外围,算术操作将被得出,几乎没有其他开销。这种范式为人工智能(AI)开辟了新的机会,这要归功于内存阵列中固有的巨大并行性和计算机内的极端能量效率,因此避免了数据传输。先前的工作表明,卑诗省在目标AI工作负载时会带来破坏性的效率,这是新兴边缘AI方案的关键指标。该手稿通过提出一个利用BC特定优化的端到端框架来建立这些发现,以实现高平行性和对AI模型的积极压缩。我们的方法得到了一个新型的硬件模块,该模块执行实时解码以及新算法,以实现BC友好的模型压缩。我们的硬件/软件方法可为最先进的BC计算方法节省91%的能源节省(精度降解约束)。
By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with little additional overhead. Such a paradigm opens novel opportunities for Artificial Intelligence (AI) at the edge, thanks to the massive parallelism inherent in memory arrays and the extreme energy efficiency of computing in-situ, hence avoiding data transfers. Previous works have shown that BC brings disruptive efficiency gains when targeting AI workloads, a key metric in the context of emerging edge AI scenarios. This manuscript builds on these findings by proposing an end-to-end framework that leverages BC-specific optimizations to enable high parallelism and aggressive compression of AI models. Our approach is supported by a novel hardware module performing real-time decoding, as well as new algorithms to enable BC-friendly model compression. Our hardware/software approach results in a 91% energy savings (for a 1% accuracy degradation constraint) regarding state-of-the-art BC computing approaches.