快速模型推断的神经网络压缩框架

论文标题

快速模型推断的神经网络压缩框架

Neural Network Compression Framework for fast model inference

论文作者

Kozlov, Alexander, Lazarevich, Ivan, Shamporov, Vasily, Lyalyushkin, Nikolay, Gorbachev, Yury

论文摘要

在这项工作中，我们提出了一个新的框架，用于通过微调进行微调，我们称之为神经网络压缩框架（NNCF）。它利用了各种网络压缩方法的最新进展，并实现了其中的一些，例如稀疏性，量化和二元化。这些方法允许获得更适合硬件的模型，这些模型可以在通用硬件计算单元（CPU，GPU）或特殊深度学习加速器上有效运行。我们表明，开发的方法可以成功应用于广泛的模型，以加速推理时间，同时保持原始精度。该框架可以在培训样本中使用，或者可以用作独立的包装，可以将其无缝集成到具有最小适应的现有培训代码中。当前，可以在https://github.com/openvinotoolkit/nncf上获得NNCF的Pytorch版本。

In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF). It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization. These methods allow getting more hardware-friendly models which can be efficiently run on general-purpose hardware computation units (CPU, GPU) or special Deep Learning accelerators. We show that the developed methods can be successfully applied to a wide range of models to accelerate the inference time while keeping the original accuracy. The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code with minimal adaptations. Currently, a PyTorch version of NNCF is available as a part of OpenVINO Training Extensions at https://github.com/openvinotoolkit/nncf.

下载PDF全文

下载文献需遵守相关版权规定

论文标题