论文标题
具有串行积累数据流的节能加速器架构,可用于深CNN
An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs
论文作者
论文摘要
近年来,卷积神经网络(CNN)表现出许多视力任务的出色精度。但是,在便携式设备和嵌入式系统上部署CNN时,大量参数和计算会导致较长的处理时间和电池寿命较低。设计CNN硬件加速器的重要因素是将卷积计算有效地映射到硬件资源上。此外,为了节省电池寿命并降低能耗,必须减少DRAM访问的数量,因为与硬件中的其他操作相比,DRAM消耗的数量级要多。在本文中,我们提出了一种节能架构,该体系结构最大程度地利用其计算单元进行卷积操作,同时需要少量的DRAM访问。实施结果表明,所提出的体系结构使用延迟393毫秒和仅251.5 MB的DRAM访问执行一个图像识别任务。
Convolutional Neural Networks (CNNs) have shown outstanding accuracy for many vision tasks during recent years. When deploying CNNs on portable devices and embedded systems, however, the large number of parameters and computations result in long processing time and low battery life. An important factor in designing CNN hardware accelerators is to efficiently map the convolution computation onto hardware resources. In addition, to save battery life and reduce energy consumption, it is essential to reduce the number of DRAM accesses since DRAM consumes orders of magnitude more energy compared to other operations in hardware. In this paper, we propose an energy-efficient architecture which maximally utilizes its computational units for convolution operations while requiring a low number of DRAM accesses. The implementation results show that the proposed architecture performs one image recognition task using the VGGNet model with a latency of 393 ms and only 251.5 MB of DRAM accesses.