LARQ计算引擎：设计，基准和部署最先进的二进制神经网络

论文标题

LARQ计算引擎：设计，基准和部署最先进的二进制神经网络

Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art Binarized Neural Networks

论文作者

Bannink, Tom, Bakhtiari, Arash, Hillier, Adam, Geiger, Lukas, de Bruin, Tim, Overweel, Leon, Neeven, Jelmer, Helwegen, Koen

论文摘要

我们介绍了LARQ Compute Engine，这是世界上最快的二元神经网络（BNN）推理引擎，并使用此框架来研究有关BNNS效率的几个重要问题，并设计了新的最先进的BNN体系结构。 LCE提供了高度优化的二进制操作实现，并使二进制卷积加速了8.5-18.5倍，与它们在Pixel 1手机上的全精度相比。 LCE与LARQ和基于MLIR的复杂转换器的集成使用户可以从培训到部署顺利进行。通过扩展TensorFlow和TensorFlow Lite，LCE支持结合二进制和全精度层的模型，并且可以轻松地集成到现有应用中。使用LCE，我们分析了现有的BNN计算机视觉架构的性能，并开发了QuickNet，这是一种简单，易于制作的BNN，它在ImageNet上的延迟和准确性方面优于现有的二进制网络。此外，我们研究了完整精确快捷方式的影响以及MAC数量与模型延迟之间的关系。我们坚信，经验性能应推动BNN建筑设计，并希望这项工作将促进其他人设计，基准和部署二进制模型。

We introduce Larq Compute Engine, the world's fastest Binarized Neural Network (BNN) inference engine, and use this framework to investigate several important questions about the efficiency of BNNs and to design a new state-of-the-art BNN architecture. LCE provides highly optimized implementations of binary operations and accelerates binary convolutions by 8.5 - 18.5x compared to their full-precision counterparts on Pixel 1 phones. LCE's integration with Larq and a sophisticated MLIR-based converter allow users to move smoothly from training to deployment. By extending TensorFlow and TensorFlow Lite, LCE supports models which combine binary and full-precision layers, and can be easily integrated into existing applications. Using LCE, we analyze the performance of existing BNN computer vision architectures and develop QuickNet, a simple, easy-to-reproduce BNN that outperforms existing binary networks in terms of latency and accuracy on ImageNet. Furthermore, we investigate the impact of full-precision shortcuts and the relationship between number of MACs and model latency. We are convinced that empirical performance should drive BNN architecture design and hope this work will facilitate others to design, benchmark and deploy binary models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题