论文标题

FATNN:快速准确的三元神经网络

FATNN: Fast and Accurate Ternary Neural Networks

论文作者

Chen, Peng, Zhuang, Bohan, Shen, Chunhua

论文摘要

三元神经网络(TNN)受到了很多关注,因为与全精确的同行相比,推理的潜在数量级和更有效的速度更快。但是,需要2个位来编码仅利用3个量化水平的三元表示。结果,与标准2位模型相比,常规TNN具有相似的内存消耗和速度,但具有更差的代表性能力。此外,TNN和Full Percision网络之间的准确性仍然存在很大的差距,从而阻碍了他们对实际应用程序的部署。为了应对这两个挑战,在这项工作中,我们首先表明,在一些轻微的限制下,三元内部产品的计算复杂性可以减少2倍。其次,为了减轻性能差距,我们精心设计了依赖于实现的三元量化算法。所提出的框架称为快速准确的三元神经网络(FATNN)。图像分类的实验表明,我们的FATNN超过了最新的精度。更重要的是,与各种精确度相比,在几个平台上分析了加速评估,这是进一步研究的强大基准。

Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts. However, 2 bits are required to encode the ternary representation with only 3 quantization levels leveraged. As a result, conventional TNNs have similar memory consumption and speed compared with the standard 2-bit models, but have worse representational capability. Moreover, there is still a significant gap in accuracy between TNNs and full-precision networks, hampering their deployment to real applications. To tackle these two challenges, in this work, we first show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2. Second, to mitigate the performance gap, we elaborately design an implementation-dependent ternary quantization algorithm. The proposed framework is termed Fast and Accurate Ternary Neural Networks (FATNN). Experiments on image classification demonstrate that our FATNN surpasses the state-of-the-arts by a significant margin in accuracy. More importantly, speedup evaluation compared with various precisions is analyzed on several platforms, which serves as a strong benchmark for further research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源