使用神经网络压缩的有效基于CNN-LSTM的图像字幕

论文标题

使用神经网络压缩的有效基于CNN-LSTM的图像字幕

Efficient CNN-LSTM based Image Captioning using Neural Network Compression

论文作者

Rampal, Harshit, Mohanty, Aman

论文摘要

现代神经网络在计算机视觉，自然语言处理和相关垂直领域的任务上实现最先进的状态。但是，他们臭名昭著的记忆和计算食欲，这进一步阻碍了他们在资源有限的边缘设备上的部署。为了实现边缘部署，研究人员开发了修剪和量化算法来压缩此类网络而不会损害其功效。这种压缩算法在独立的CNN和RNN体系结构上进行了广泛的实验，而在这项工作中，我们提出了基于CNN-LSTM的图像字幕模型的非常规到端的压缩管道。该模型是使用VGG16或RESNET50作为编码器和FlickR8K数据集上的LSTM解码器训练的。然后，我们检查了不同的压缩体系结构对模型的影响，并设计了一种压缩体系结构，该结构的模型大小降低了73.1％，推理时间减少了71.3％，与未压缩的对应物相比，BLEU得分增加了71.3％。

Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals. However, they are notorious for their voracious memory and compute appetite which further obstructs their deployment on resource limited edge devices. In order to achieve edge deployment, researchers have developed pruning and quantization algorithms to compress such networks without compromising their efficacy. Such compression algorithms are broadly experimented on standalone CNN and RNN architectures while in this work, we present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model. The model is trained using VGG16 or ResNet50 as an encoder and an LSTM decoder on the flickr8k dataset. We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size, 71.3% reduction in inference time and a 7.7% increase in BLEU score as compared to its uncompressed counterpart.

下载PDF全文

下载文献需遵守相关版权规定

论文标题