加速神经网络的图像分类

论文标题

加速神经网络的图像分类

Image Classification on Accelerated Neural Networks

论文作者

Sikdokur, Ilkay, Baytas, Inci, Yurdakul, Arda

论文摘要

对于图像分类问题，各种神经网络模型通常因其在产生高精度方面的成功而被使用。卷积神经网络（CNN）是图像分类应用程序最常用的深度学习方法之一。就其复杂性而言，它可能会产生非常准确的结果。但是，模型越复杂的训练时间越长。在本文中，为基本CNN模型提供了使用FPGA功率的加速度设计，该模型由一个卷积层和一个完全连接的层组成，用于完全连接的层的训练阶段。但是，由于训练阶段包括推理，推理阶段也会自动加速。在此设计中，卷积层由主机计算机计算，并且完全连接的层由FPGA板计算。应当指出，在此设计中未考虑卷积层的训练，并留下来进行未来的研究。结果非常令人鼓舞，因为该FPGA设计在训练和推理中大约2次，在主机计算机上的一些最先进的深度学习平台（例如Tensorflow）的性能高。

For image classification problems, various neural network models are commonly used due to their success in yielding high accuracies. Convolutional Neural Network (CNN) is one of the most frequently used deep learning methods for image classification applications. It may produce extraordinarily accurate results with regard to its complexity. However, the more complex the model is the longer it takes to train. In this paper, an acceleration design that uses the power of FPGA is given for a basic CNN model which consists of one convolutional layer and one fully connected layer for the training phase of the fully connected layer. Nonetheless, inference phase is also accelerated automatically due to the fact that training phase includes inference. In this design, the convolutional layer is calculated by the host computer and the fully connected layer is calculated by an FPGA board. It should be noted that the training of convolutional layer is not taken into account in this design and is left for future research. The results are quite encouraging as this FPGA design tops the performance of some of the state-of-the-art deep learning platforms such as Tensorflow on the host computer approximately 2 times in both training and inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题