论文标题
卷积神经网络的近乎最佳硬件设计
Near-Optimal Hardware Design for Convolutional Neural Networks
论文作者
论文摘要
最近,对工业应用的低功率深度学习硬件的需求正在增加。大多数现有的人工智能(AI)芯片已经演变为依靠新的芯片技术,而不是基于根本的新硬件体系结构来维持其一般性。这项研究提出了一种用于卷积神经网络的新颖,特殊和高效的硬件体系结构。提出的体系结构通过与模型的计算流相同的结构设计计算电路来最大化乘数的利用,而不是将计算映射到固定硬件。此外,在每个时钟周期中,仅使用一个内存读取操作,同时提供了一个专门设计的滤波器电路。这允许计算电路无缝地操作而无需闲置周期。我们基于提议的体系结构的参考系统在整个计算期间所需的计算模型所需的实际计算中使用了97%的峰值 - 多纹理能力。此外,将开销组件最小化,以使构成非杀菌剂组件的资源的比例小于构成乘数组件的资源,而乘数组件对于计算模型来说是必不可少的。所提出的体系结构的效率接近理想的有效系统,该系统无法进一步提高性能与资源比率。基于建议的硬件体系结构的实施已应用于商业AI产品。
Recently, the demand of low-power deep-learning hardware for industrial applications has been increasing. Most existing artificial intelligence (AI) chips have evolved to rely on new chip technologies rather than on radically new hardware architectures, to maintain their generality. This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks. The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model, rather than mapping computations to fixed hardware. In addition, a specially designed filter circuit simultaneously provides all the data of the receptive field, using only one memory read operation during each clock cycle; this allows the computation circuit to operate seamlessly without idle cycles. Our reference system based on the proposed architecture uses 97% of the peak-multiplication capability in actual computations required by the computation model throughout the computation period. In addition, overhead components are minimized so that the proportion of the resources constituting the non-multiplier components is smaller than that constituting the multiplier components, which are indispensable for the computational model. The efficiency of the proposed architecture is close to an ideally efficient system that cannot be improved further in terms of the performance-to-resource ratio. An implementation based on the proposed hardware architecture has been applied in commercial AI products.